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Abstract 

Marine  mammal  whistle  calls  present  an  attractive  medium  for  covert  underwater 
communications.  High  quality  models  of  the  whistle  calls  are  needed  in  order  to  syn¬ 
thesize  natural-sounding  whistles  with  embedded  information.  Since  the  whistle  calls 
are  composed  of  frequency  modulated  harmonic  tones,  they  are  best  modeled  as  a 
weighted  superposition  of  harmonically  related  sinusoids.  Previous  research  with  bot- 
tlenose  dolphin  whistle  calls  has  produced  synthetic  whistles  that  sound  too  “clean” 
for  use  in  a  covert  communications  system.  Due  to  the  sensitivity  of  the  human  audi¬ 
tory  system,  watermarking  schemes  that  slightly  modify  the  fundamental  frequency 
contour  have  good  potential  for  producing  natural-sounding  whistles  embedded  with 
retrievable  watermarks.  Structured  total  least  squares  is  used  with  linear  prediction 
analysis  to  track  the  time-varying  fundamental  frequency  and  harmonic  amplitude 
contours  throughout  a  whistle  call.  Simulation  and  experimental  results  demonstrate 
the  capability  to  accurately  model  bott.lenose  dolphin  whistle  calls  and  retrieve  em¬ 
bedded  information  from  watermarked  synthetic  whistle  calls.  Different  fundamental 
frequency  watermarking  schemes  are  proposed  based  on  their  ability  to  produce  natu¬ 
ral  sounding  synthetic  whistles  and  yield  suitable  watermark  detection  and  retrieval. 
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Title:  Associate  Scientist 


3 


Acknowledgments 

This  thesis  started  as  a  bold  idea  without  the  guarantee  of  success,  which  is  typical  of 

4 

all  true  research.  Jim  Preisig  was  uniquely  suited  to  work  with  me  on  the  project,  and 
from  the  start,  we  were  both  captivated  by  its  potential.  Jim’s  focus  and  persistence 
in  developing  a  thorough  understanding  of  the  frequency  estimation  problem  were 
essential  to  applying  the  recent  advances  in  structured  total  least  squares  to  the 
well-known  but  rarely-used  Prony’s  method.  When  it  conies  to  running  a  complex 
underwater  acoustic  communications  experiment,  Jim’s  proficiency  cannot  be  beat. 

I  want  to  thank  Art  Baggeroer  for  the  support  he  has  given  me  and  previous 
Navy  Joint  Program  students  throughout  the  years.  His  suggestions  were  crucial  for 
developing  my  theoretical  understanding  of  the  frequency  estimation  problem. 

Milica  Stojanovic  brought  a  burst  of  energy  in  the  final  stages  of  the  work,  at  the 
expense  of  her  own  research.  Her  enthusiasm  helped  turn  a  potential  crisis  into  the 
promising  watermarking  scheme  involving  continuous  phase  modulation. 

Stacy  DeRuiter  and  Laela.  Sayigh  helped  me  to  understand  marine  mammal  whis¬ 
tle  calls  from  a  biological  perspective,  and  gave  me  access  to  high  quality  recordings 
that  were  necessary  for  good  frequency  estimation. 

Marsha  Gomes  has  been  extremely  helpful  answering  all  of  my  administrative 
questions  and  looking  after  the  general  well-being  of  the  Joint  Program  students. 

Finally,  I  thank  my  Lord  and  Savior,  Jesus  Christ:  by  whom,  through  whom,  and 
for  whom  are  all  things;  who  has  called  me  by  name  unto  a  holy  calling  according  to 
His  own  purpose  and  grace;  who,  while  strengthening  me  to  complete  this  project, 
has  enabled  me  to  lead  others  into  a.  deeper  trust  in  Him;  and  who  testifies,  “Behold, 
I  come  quickly,  and  my  reward  is  with  me,  to  give  every  man  according  as  his  work 
shall  be.” 


5 


£ 


\ 


6 


I 


To  the  God  of  unbroken  promises 

& 

To  you  who  will  not  be  withheld 


7 


Contents 


1  Introduction  15 

1.1  Prior  Work  .  16 

1.1.1  Classification  of  Bottlenose  Dolphin  Whistle  Calls .  16 

1.1.2  Prior  Models  of  Bottlenose  Dolphin  Whistle  Calls .  16 

1.1.3  Related  Work  in  Human  Speech  Processing .  18 

1.2  Introduction  to  Information  Hiding .  20 

1.2.1  Steganography  .  20 

1.2.2  Watermarking .  21 

1.2.3  Applicable  Digital  Audio  Watermarking  Techniques .  21 

1.3  Objectives .  25 

1.4  Organization  .  25 

2  Sinusoidal  Modeling  Using  Linear  Prediction  27 

2.1  Prony’s  Method .  28 

2.2  Least  Squares  Prony  Method .  30 

2.3  Total  Least  Squares  Approach .  32 

2.3.1  Solution  to  the  Total  Least  Squares  Problem .  32 

2.3.2  Prony’s  Method  and  Total  Least  Squares .  36 

2.4  Structured  Total  Least  Squares  Approach .  37 

2.4.1  STLS  Solution  for  Hankel/Toeplitz  Matrices .  38 


9 


3  Simulation  Results  47 

3.1  Algorithm  Description .  48 

3.2  Frequency  Tracking  of  Chirp  Signals  . , .  53 

3.2.1  Single  Harmonic  Linear  Chirp .  53 

3.2.2  Double  Harmonic  Linear  Chirp . .  54 

3.2.3  Single  Harmonic  Linear  Chirp  with  Abrupt  Frequency  Shifts  .  55 

3.2.4  Single  Harmonic  Linear  +  Sinusoidal  Chirp  .  57 

3.2.5  Comparison  with  Alternative  Frequency  Estimators .  58 

3.3  Amplitude  Estimation  of  Chirp  Signals .  62 

4  Synthetic  Marine  Mammal  Whistle  Calls  67 

4.1  Modeling  Recorded  Bottlenose  Dolphin  Whistle  Calls .  67 

4.2  Watermarked  Synthetic  Whistle  Calls .  78 

4.2.1  Linear  Chirp  Segments  With  Abrupt  Frequency  Shifts  ....  80 

4.2.2  Continuous  Phase  Modulation .  83 

5  Experimental  Results  87 

5.1  RACE08  Description .  87 

5.2  RACE08  Results .  88 

6  Conclusions  and  Future  Directions  95 

A  Prony’s  Derivation  of  the  Linear  Prediction  Equations  99 


10 


List  of  Figures 


1-1  Quatieri  and  McAulay’s  speech  production  model  [53] .  19 

1-2  Communication  model  for  watermarking  [23] .  23 

1-3  Parametric  watermarking  scheme  [38] .  24 

3-1  HTLS  frequency  tracking  performance  for  a  linear  chirp  (SNR  =  5  dB)  54 

3-2  HTLS  frequency  tracking  performance  for  a  linear  chirp  with  two  har¬ 
monics  (SNR  =  5  dB)  .  55 

3-3  HTLS  frequency  tracking  performance  for  a  linear  chirp  with  abrupt 

frequency  shifts  (SNR  =  5  dB) .  56 

3-4  HTLS  frequency  tracking  performance  vs.  A  (SNR  =  15  dB) .  56 

3-5  HTLS  frequency  tracking  performance  for  sinusoidal  chirp  (SNR  =  5 

dB) .  58 

3-6  Linear  chirp  frequency  estimator  performance  vs.  CRLB .  60 

3-7  Tukey  window  with  a  =  0.5 .  63 

3-8  Amplitude  estimation  performance  for  double  harmonic  linear  chirp 

(SNR  =  50  dB) .  64 

3-9  Residual  MSE  for  double  harmonic  linear  chirp  (SNR  =  50  dB)  ...  65 

3-10  Amplitude  estimation  performance  for  double  harmonic  linear  chirp 

(SNR  =  25  dB) .  66 

3- 11  Residual  MSE  for  double  harmonic  linear  chirp  (SNR  =  25  dB)  ...  66 

4- 1  Bottlenose  dolphin  whistle  call .  68 


11 


4-2  Spectrogram  of  bottlenose  dolphin  whistle  call  in  Fig.  4-1  (dB)  ....  68 

4-3  Fundamental  frequency  estimation  of  Whistle  1  in  Fig.  4-1  using  over¬ 
lapping  bandpass  filters . . .  70 

4-4  Fundamental  frequency  contour  of  Whistle  1  in  Fig.  4-1 .  70 

4-5  Estimated  amplitude  contours  for  Whistle  1  in  Fig.  4-1 .  71 

4-6  Residual  MSE  for  Whistle  1  in  Fig.  4-1 .  72 

4-7  Recorded  (top)  vs.  synthetic  (bottom)  versions  of  Whistle  1  in  Fig.  4-1  74 

4-8  Spectrograms  of  recorded  (left)  vs.  synthetic  (right)  versions  of  Whis¬ 
tle  1  in  Fig.  4-1  (dB) .  74 

4-9  Fundamental  frequency  contour  of  Whistle  2  in  Fig.  4-1 .  75 

4-10  Estimated  amplitude  contours  for  Whistle  2  in  Fig.  4-1 .  75 

4-11  Fundamental  frequency  contour  of  Whistle  3  in  Fig.  4-1 .  76 

4-12  Estimated  amplitude  contours  for  Whistle  3  in  Fig.  4-1 .  76 

4-13  Residual  MSE  for  Whistle  2  in  Fig.  4-1 .  77 

4-14  Residual  MSE  for  Whistle  3  in  Fig.  4-1 .  77 

4-15  Impulse  response  of  moving- average  filter .  79 

4-16  Fundamental  frequency  and  IMF  contours  for  a  portion  of  Whistle  2 

in  Fig.  4-1 .  80 

4-17  Watermarking  scheme  based  on  linear  chirp  segments  with  abrupt  fre¬ 
quency  shifts  .  81 

4-18  Linear  chirp  watermarked  frequency  contour  of  Whistle  2  in  Fig.  4-1  .  82 

4-19  Watermarking  scheme  based  on  CPM  perturbation  of  the  IMF  contour  83 

4- 20  Unmodified  and  CPM-watermarked  frequency  contours  for  a  portion 

of  Whistle  2  in  Fig.  4-1 .  85 

5- 1  University  of  Rhode  Island’s  Narragansett  Bay  Campus .  88 

5-2  Spectrograms  of  unmodified  synthetic  whistle  calls  received  at  the  ref¬ 
erence  (left.)  and  N1000  (right)  hydrophones  (dB) .  89 


12 


5-3  Spectrograms  of  watermarked  synthetic  whistle  calls  received  at  the 

reference  (left)  and  N1000  (right.)  hydrophones  (dB) .  90 

5-4  Reference  hydrophone  recording  of  unmodified  Whistle  3  in  Fig.  5-2  .  90 

5-5  Frequency  estimation  performance  for  unmodified  (left.)  and  water¬ 
marked  (right)  whistle  contours  received  at  reference  hydrophone  .  .  91 

5-6  Frequency  estimation  performance  for  unmodified  (left)  and  water¬ 
marked  (right)  whistle  contours  received  at  N1000  hydrophone  ...  92 

5-7  Frequency  estimation  performance  for  unmodified  whistle  contour  re¬ 
ceived  at  N1000  hydrophone .  93 

5-8  Frequency  estimation  performance  for  watermarked  whistle  contour 

received  at  N1000  hydrophone  .  93 


13 


14 


Chapter  1 


Introduction 


Due  to  the  challenges  of  the  underwater  ocean  environment,  minimal  progress  has 
been  made  in  the  field  of  covert  underwater  acoustic  communications.  In  some  appli¬ 
cations,  low  data  rates  would  be  an  acceptable  tradeoff  for  a  sufficiently  low  probabil¬ 
ity  of  detection.  Current  robust  underwater  acoustic  communication  systems  rely  on 
a  relatively  high  Signal-to-Noise  Ratio  (SNR)  that  precludes  a  covert  posture.  Ma¬ 
rine  biologies  provide  a  significant  source  of  background  noise  that  any  underwater 
acoustic  communications  system  needs  to  overcome.  However,  if  the  communications 
scheme  was  able  to  mimic  marine  biologies  in  their  natural  environment,  a  covert 
posture  may  be  retained  while  operating  at  a  relatively  high  SNR. 

Marine  mammal  whistle  calls  are  an  attractive  medium  for  masking  underwater 
acoustic  communications  due  to  their  low  frequency  range,  relatively  sustained  du¬ 
ration  and  regular  harmonic  structure.  High-quality  synthetic  models  are  needed  to 
effectively  mimic  marine  mammal  whistle  calls  with  an  embedded  information  signal. 
This  thesis  focuses  on  developing  techniques  for  processing  and  embedding  informa¬ 
tion  in  bottlenose  dolphin  whistle  calls,  but  the  techniques  derived  are  applicable  to 
other  harmonically-structured  tonal  signals,  including  other  marine  mammal  whistle 
calls. 
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1.1  Prior  Work 


1.1.1  Classification  of  Bottlenose  Dolphin  Whistle  Calls 

Christian  [9]  compiled  a  database  of  bottlenose  dolphin  whistle  calls  for  his  research 
on  using  generic  signal  compression  for  the  identification,  characterization  and  repeti¬ 
tion  detection  of  various  signals.  His  approach  estimated  the  fundamental  frequency 
contour  of  a  whistle  call,  recorded  with  a  nominal  50  kHz  sample  rate,  using  512 
point  blocks  with  no  overlap,  as  higher  resolution  was  not  considered  necessary.  He 
compared  the  periodogram  and  Burg’s  autoregressive  (AR)  methods  of  spectral  es¬ 
timation,  and  concluded  that  the  periodogram  provided  sufficient  resolution  of  the 
fundamental  frequency  when  compared  to  the  computationally  expensive  Burg  tech¬ 
nique.  Five  major  spectral  peaks  from  each  block  were  retained  from  which  a  tracking 
algorithm  resolved  the  fundamental  frequency  contour.  A  16-dimension  coding  space 
was  then  developed  using  the  fundamental  frequency  contour  to  generate  a  dictionary 
of  unique  whistles.  Single  dolphins  were  found  to  reproduce  their  signature  whistles 
very  precisely,  and  were  estimated  to  be  capable  of  producing  over  a  billion  unique 
whistles. 


1.1.2  Prior  Models  of  Bottlenose  Dolphin  Whistle  Calls 

Although  some  methods  used  in  human  speech  analysis  and  synthesis  have  been 
tested  on  marine  mammals  [3,  46],  Buck  et  al.  [5,  24]  have  been  behind  the  effort 
to  model  bottlenose  dolphin  whistle  calls  for  synthesis  and  modification  purposes.  A 
parametric  model  that  can  synthesize  natural-sounding  whistles  can  be  used  to  study 
how  dolphins  communicate  by  modifying  the  whistle  frequency  contour  and  observing 
the  response  of  dolphins. 


16 


Weighted  Superposition  of  Sinusoidal  Harmonics 


Buck  et  al.  [5]  initially  proposed  a  whistle  model  characterized  as  the  weighted  su¬ 
perposition  of  harmonically  related  sinusoids, 

R 

s[n]  =  'S^ar[n]sin(2n4>r[n])  ,  (1.1) 

r=l 

which  embodies  their  typical  description  as  frequency-modulated  tonal  calls.  The 
fundamental  frequency  contour  is  extracted  using  a  peak-picking  algorithm  detailed 
in  [6],  which  was  found  to  work  well  for  recordings  of  individual  animals  at  high 
SNR.  The  signal  is  broken  into  short  blocks  for  which  it  is  assumed  to  be  relatively 
constant  in  amplitude  and  frequency.  Frequency  and  energy  contours  for  each  har¬ 
monic  are  constructed  from  analyzing  each  block.  Different  modification  strategies 
are  proposed  that  modify  different  characteristics  of  the  frequency  and  energy  con¬ 
tours.  Finally,  whistles  are  synthesized  at  the  original  sample  rate  by  interpolating 
phase  and  amplitude  contours  from  the  compressed  frequency  and  energy  contours. 
This  technique  differed  from  other  speech  processing  algorithms  [2,  46,  64]  primarily 
in  that  discrete-time  upsampling  was  performed  instead  of  linear  or  polynomial  inter¬ 
polation  between  blocks.  Example  whistles  recorded  at  81.92  kHz  were  synthesized 
using  a  block  length  of  512  samples  with  50%  overlap.  Human  testing  could  dis¬ 
tinguish  between  the  original  and  unmodified  synthetic  whistles  using  quarter-speed 
in-air  playbacks.  The  synthetic  whistles  were  characterized  as  “clean  sounding”  and 
“not  enough  noise”  when  compared  to  the  original  whistles. 

Autoregressive  Model 

Based  on  the  distinct  perceptual  differences  between  original  dolphin  whistles  and 
their  synthetic  counterparts  produced  with  the  sinusoidal  model,  Buck's  student 
Huang  [24]  proposed  using  an  AR  synthesis  model  to  generate  more  natural-sounding 
synthetic  whistles.  The  whistle  was  broken  into  blocks  of  length  512  samples  with 
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75%  overlap,  which  are  smoothly  recombined  during  synthesis  using  a  half- amplitude 
Hamming  window.  Each  block  was  then  modeled  using  a  high  order  (p  —  60)  AR 
model.  It  was  noted  that  the  signal  residue  power  spectrum  pontained  a  noticeable 
component  of  the  original  frequency  contour.  For  each  block,  the  resulting  system 
poles  were  compared  to  the  frequency  contour  used  in  the  sinusbidal  model  for  select¬ 
ing  signal  poles  corresponding  to  each  harmonic.  The  whistles  are  then  synthesized 
by  driving  the  corresponding  all-pole  filter  for  each  block  with  the  signal  residue  for 
unmodified  whistles  and  a  white  noise  residue  for  modified  whistles.  While  the  AR 
synthesis  whistles  sounded  more  “natural”  than  the  cleaner  sinusoidal  synthesis  whis¬ 
tles,  a  study  has  not  been  performed  to  assess  the  overall  quality  of  the  AR  synthesis 
whistles.  Some  problems  encountered  were  the  high  computational  load  and  the  need 
to  choose  algorithm  parameters  such  as  block  length,  amount  of  overlap  and  AR 
system  individually  for  each  dolphin  whistle. 

1.1.3  Related  Work  in  Human  Speech  Processing 

Generally,  human  speech  processing  has  focused  on  a  stochastic  model  for  speech 
production  that  seeks  to  design  filters  that  imitate  the  physical  dynamics  of  speech  [15, 
41],  These  filters  are  then  driven  by  combinations  of  two  basic  forms  of  excitation, 
periodic  impulses  for  voiced  speech  and  white  noise  for  unvoiced  speech.  Linear 
prediction  analysis  is  usually  used  to  design  all-pole  filters  that  describe  short  blocks 
of  similar  speech  patterns.  Cepst.ral  analysis  was  developed  to  separate  the  impulse 
response  of  the  vocal  system  model  from  the  excitation  sequence,  but  its  application 
is  limited  based  on  its  computational  complexity. 

The  basic  sinusoidal  superposition  model  in  Eq.  (1.1)  used  by  Buck  et  al.  [5] 
has  been  researched  in  human  speech  processing  with  excellent  results.  Serra  and 
Smith  [64]  note  that  additive  synthesis  algorithms  were  among  the  first  techniques 
used  in  computer-based  synthesis,  with  the  introduction  of  the  heterodyne  filter  in 
the  early  1970’s,  followed  by  the  digital  phase  vocoder.  McAulay  and  Quatieri  [46,  53] 
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Figure  1-1:  Quatieri  and  McAulay’s  speech  production  model  [53] 


and  Smith  and  Serra  [65]  developed  similar  algorithms  at  about  the  same  time  that 
addressed  inharmonic  and  pitch-changing  sounds.  Essentially,  each  algorithm  used 
the  same  sinusoidal  model  while  developing  new  methods  to  track  relevant  frequency 
contours  and  smoothly  vary  amplitude  and  phase  from  block  to  block.  The  signal 
was  broken  into  analysis  blocks,  with  overlap  ranging  from  50%  to  75%,  and  relevant 
frequencies  selected  based  on  peaks  in  the  discrete  Fourier  transform.  McAulay  and 
Quatieri  included  a  time-varying  filter  model  of  the  vocal  tract  at  the  output  of  the 
sinusoidal  representation,  as  seen  in  Fig.  1-1.  For  a  variety  of  sounds,  including  some 
whale  sounds,  their  algorithm  was  reported  to  produce  synthetic  signals  “essentially 
perceptually  indistinguishable”  from  the  original  signal.  Serra  and  Smith  [64]  up¬ 
dated  their  algorithm  to  better  incorporate  noise-like  aspects  of  speech  by  removing 
the  sinusoidal  representation  from  the  original  signal  and  then  applying  stochastic 
modeling  to  the  residual,  but  found  that  combining  the  sinusoidal  and  stochastic 
components  sometimes  produced  undesirable  results.  The  deterministic  plus  stochas¬ 
tic  model  was  refined  by  Levine  [36]  by  further  decomposing  the  stochastic  component 
into  a  quasi-st, ationary  “noise”  part  and  a  rapidly  changing  “transient”  part,  resulting 
in  a  coding  scheme  that  is  both  efficient  and  expressive  [38]. 
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1.2  Introduction  to  Information  Hiding 


The  field  of  information  hiding  [11,  27]  has  largely  grown  out  of  the  field  of  cryp- 
tography  to  include  the  additional  aspect  of  keeping  the  existence  of  the  information 
secret.  A  lot  of  the  techniques  that  are  used  in  information  hiding  draw  upon  the 
experience  gained  from  cryptography,  and  in  many  cases  the  lines  between  the  two 
are  blurred,  since  any  cryptographic  system  would  be  more  robust  to  attack  if  its  very 
existence  was  a  secret.  However,  the  practical  wisdom  of  cryptography  teaches  that 
sensitive  information  should  also  be  protected  by  a  secret  key,  to  safeguard  against 
the  information  hiding  techniques  being  discovered  [50].  In  general,  information  hid¬ 
ing  techniques  can  be  divided  into  four  categories,  which  either  include  or  exclude 
the  separate  principles  of  steganography  and  watermarking  based  on  their  applica¬ 
tion  [11], 

1.2.1  Steganography 

Steganography  is  the  art  of  concealed  communication,  in  which  the  very  existence  of 
a  message  is  secret  [11].  Most  applications  of  steganography  follow  the  same  general 
principle  [26]  described  as  follows.  Alice,  who  wants  to  share  a  secret  message  m 
with  Bob,  randomly  chooses  a  harmless  message  c,  called  cover- object,  which  can 
be  transmitted  to  Bob  without  raising  suspicion.  With  the  potential  use  of  a  secret 
key  k,  a  stego-object  s  is  generated  by  embedding  m  into  c  in  a  careful  way  so  that 
a  third  party  cannot  detect  the  existence  of  a  secret  in  the  apparently  harmless 
message  s.  Alice  then  transmits  s  to  Bob  over  an  insecure  channel,  hoping  that 
Wendy,  a  suspicious  person  with  access  to  s,  will  not  notice  the  embedded  message. 
Bob  can  reconstruct  m,  since  he  knows  the  embedding  method  used  by  Alice  and 
has  access  to  the  key  k  used  in  the  embedding  process.  The  extraction  of  m  from 
s  should  be  possible  without  access  to  the  original  cover  c.  In  a  “perfect”  system, 
a  normal  cover  should  be  indistinguishable  from  a  stego-object,  either  by  a  human 
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or  computer  looking  for  a  statistical  pattern.  There  are  basically  three  types  of 
steganographic  protocols  that  differ  based  on  the  choice  of  k.  Pure  steganography 
does  not  incorporate  the  prior  exchange  of  secret  information,  ?so  a  key  is  not  used  in 
the  embedding  process.  Secret  key  and  public  key  steganography  bolster  security  by 
using  a  secret  or  public  key  in  the  embedding  process,  although  both  use  a  secret  key 
to  reconstruct  the  secret  message  [26]. 

1.2.2  Watermarking 

Watermarking,  while  closely  related  to  steganography,  is  based  on  different  underlying 
philosophies,  requirements,  and  applications  that  result  in  techniques  that  clearly 
distinguish  themselves  from  steganography.  Essentially,  the  purpose  of  a  watermark 
is  to  embed  self-identifying  information  within  a  cover-object,  that  can  be  used  for 
copyright  protection  or  tracking  purposes.  While  the  existence  of  a  watermark  does 
not  normally  need  to  be  kept  secret,  the  watermark  should  be  permanently  attached 
to  the  cover-object.  Thus,  watermarking  has  the  notion  of  being  robust  to  both 
malicious  and  benign  attacks  to  remove  the  identifying  information.  In  practical 
commercial  applications,  the  watermark  should  be  perceptually  transparent  enough 
to  not  annoy  consumers  or  reduce  the  value  of  the  product  [32] . 

1.2.3  Applicable  Digital  Audio  Watermarking  Techniques 

Watermarking  of  digital  audio  signals  is  more  challenging  compared  to  watermarking 
image  or  video  sequences  due  to  the  wide  dynamic  range  of  the  human  auditory 
system  (HAS).  The  HAS  perceives  sounds  over  a  range  of  power  greater  than  109  :  1 
and  a  range  of  frequencies  greater  than  103  :  1.  In  particular,  the  HAS  has  a  high 
sensitivity  to  additive  white  Gaussian  noise,  which  can  be  detected  as  low  as  80  dB 
below  ambient  level  in  a  sound  hie.  However,  there  are  some  “holes”  available  in 
which  to  place  a  watermark.  While  the  HAS  has  a  wide  dynamic  range,  it  has 
a  small  differential  range,  meaning  loud  sounds  generally  tend  to  mask  out  quiet 
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sounds.  Additionally,  the  HAS  is  insensitive  to  a  constant  relative  phase  shift  in  a 
stationary  audio  signal.  Finally,  some  environmental  distortions  are  so  common  that 
they  are  ignored  by  the  listener  in  most  cases  [4,  12].  . 

Due  to  the  sensitivity  of  the  HAS,  digital  audio  watermarking  techniques  apply 
directly  to  steganographic  applications,  since  on  a  perceptual  basis  the  existence  of 
an  embedded  message  needs  to  be  kept  a  secret.  In  a  covert  communications  scenario, 
the  robustness  against  intentional  attacks  is  not  usually  required,  although  signal  pro¬ 
cessing  modifications,  channel-induced  signal  distortion  and  additive  ambient  noise 
should  not  prevent  retrieval  of  the  watermark.  In  these  applications,  the  watermark  is 
expected  to  achieve  a  higher  data  rate  and  use  blind  detection  schemes  for  watermark 
detection  and  reconstruction  [12]. 

Fig.  1-2  shows  a  basic  model  depicting  watermarking  as  a  communications  pro¬ 
cess,  as  described  by  He  and  Seordilis  [23] .  A  secret  message  is  modulated  into  a 
watermark  waveform  using  a  secret  key.  The  watermark  is  embedded  imperceptibly 
into  a  host  signal  to  form  the  stego-signal.  Transmission  through  a  channel  adds 
noise  and  distortion  to  the  stego-signal.  The  watermark  detector  reconstructs  the 
watermark  from  the  received  signal  using  the  secret  key,  and  in  some  cases,  the  host 
signal.  Blind  detection,  in  which  the  host  signal  is  not  available,  is  more  flexible  in 
operation,  but  lowers  the  achievable  data  rate  by  making  detection  more  complex. 

In  the  underwater  channel,  the  primary  sources  of  distortion  are  multipath  arrivals 
and  Doppler  spreading  [29,  51].  In  order  to  combat  these  effects  and  maintain  the 
fidelity  of  the  stego-signal,  the  best  watermarking  scheme  appears  to  be  based  on 
slight  modifications  of  the  fundamental  frequency  contour  that  result  in  natural¬ 
sounding  stego-signals.  Liu  [38]  has  focused  on  a  parametric  approach  to  digital 
audio  watermarking  that  is  heavily  based  on  the  sinusoidal  synthesis  model  and  the 
work  of  Smith,  Serra  and  Levine  [36,  64].  Fig.  1-3  shows  the  watermarking  scheme 
based  on  parametric  analysis  and  synthesis  proposed  by  Liu  [38].  To  embed  a  binary 
watermark  W,  the  host  signal  is  first  decomposed  into  s  =  S|@)  +  r,  where  s^)  is 
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Figure  1-2:  Communication  model  for  watermarking  [23] 


perfectly  parameterized  and  r  is  a  residual  orthogonal  to  s^).  Then,  the  parameter 
set  [61)  is  modified  to  |#*)  to  carry  the  watermark  W.  The  new  signal  S|#.),  constructed 
from  the  watermarked  parameter  set  .  is  combined  with  r  to  form  the  stego-signal  x 
which  is  transmitted  through  a  channel  with  unknown  noise  and  distortion.  Upon 
the  reception  of  a  corrupted  copy  y,  parameters  are  estimated  so  as  to  decode  W. 
The  attempt  at  watermarking  is  successful  if  the  estimated  parameters  1 9)  are  close 
enough  to  I#*}  such  that  the  decoded  binary  message  W  is  identical  to  W.  There  is  an 
inherent  tradeoff  when  determining  how  6  is  modified  to  6*:  the  modification  should 
be  small  enough  to  not  introduce  perceptible  distortion,  but  it  should  also  be  as  big  as 
possible  to  maximize  robustness  against  attacks.  In  the  case  of  a  digital  audio  signal, 
the  parametric  component  matches  the  sinusoidal  model  perfectly  and  receives 
the  watermark,  while  the  stochastic  component  r  is  removed  during  watermarking 
but  then  added  back  in  for  transmission  to  minimize  perceptible  alteration  from  the 
host  signal  s. 

Chen  and  Wornell  [8]  designed  a  class  of  digital  watermarking  techniques  called 
quantization  index  modulation  (QIM)  that  were  shown  to  reach  or  nearly  reach  em¬ 
bedding  rate  capacity  for  important  classes  of  models.  However,  this  simplest  form 


23 


Figure  1-3:  Parametric  watermarking  scheme  [38] 


of  QIM  was  not  robust  to  amplitude  scaling,  which  is  a  common  operation  in  music 
processing.  Liu  is  currently  working  on  the  development  of  a  F-QIM  watermarking 
scheme  that  applies  QIM  techniques  to  the  frequency  parameters  in  the  sinusoidal 
synthesis  model  [38]. 

Krishnan  et  al.  have  proposed  a  watermarking  scheme  based  on  joint  time  fre¬ 
quency  analysis  of  the  audio  signal  [30].  Most  of  the  other  watermarking  techniques 
analyze  audio  in  either  the  time  or  frequency  domains  separately,  which  does  address 
the  nonstationarity  of  audio  signals.  Krishnan  et  al.  calculate  the  instantaneous  mean 
frequency  (IMF)  of  the  audio  signal  using  the  Wigner-Ville  distribution  (WVD).  The 
WVD  is  a  time  frequency  distribution  that  gives  a  clear  picture  of  the  instantaneous 
frequency  and  group  delay  of  a  signal,  but  suffers  from  confusing  artifacts  when  the 
signals  are  multicomponent  [10].  The  IMF  for  short  blocks  of  the  signal  is  deter¬ 
mined,  and  then  a  spread  spectrum  watermarking  scheme  is  implemented;  to  recover 
the  watermark  the  IMF  for  the  original  signal  is  needed.  Krishnan  et  al.  also  propose 
a  chirp  based  spread  spectrum  watermarking  scheme  that  reduces  the  complexity 
of  watermark  detection  relative  to  the  IMF  scheme.  The  detector  extracts  the  wa¬ 
termarking  bits  and  uses  the  WVD  and  a  chirp  detection  algorithm  to  decode  the 
watermark  [30]. 
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1.3  Objectives 


This  thesis  proposes  a  new  approach  for  determining  the  parameters  of  the  sinusoidal 
superposition  model  of  Eq.  (1.1)  to  represent  recorded  marine  mammal  whistle  calls. 
To  achieve  high  quality  results,  the  recordings  are  assumed  to  consist  solely  of  tonal 
whistle  calls  at  high  SNR  produced  by  a  single  animal,  without  contamination  by  high 
frequency  clicks.  A  new  method  for  tracking  the  nonlinear  fluctuations  in  a  whistle 
call’s  fundamental  frequency  contour  is  developed  based  on  the  structured  total  least 
squares  method.  Amplitude  contours  for  each  harmonic  are  then  determined  using 
the  estimated  fundamental  frequency  contour  and  Prony’s  method.  Different  meth¬ 
ods  of  watermarking  the  fundamental  frequency  contour  are  examined  in  terms  of 
human  imperceptibility  and  complexity  of  watermark  reconstruction  in  the  underwa¬ 
ter  environment.  Experimental  data  is  presented  demonstrating  the  ability  to  track  a 
whistle’s  fundamental  frequency  contour  in  an  underwater  communications  scenario. 
In  summary,  the  ability  to  communicate  at  low  data  rates  using  a  natural-sounding 
synthetic  marine  mammal  whistle  call  is  demonstrated. 

1.4  Organization 

The  remainder  of  this  thesis  consists  of  five  chapters.  Chapter  2  develops  the  pro¬ 
gression  of  linear  prediction  techniques  to  model  exponentially  damped  sinusoidal 
data.  Chapter  3  describes  a  new  approach  to  estimate  the  frequency  and  amplitude 
contours  of  chirp  signals.  Simulation  results  demonstrate  the  performance  of  the  new 
approach,  and  other  frequency  estimation  methods  are  compared  to  the  structured 
total  least  squares  method.  Chapter  4  applies  the  results  of  Chapter  3  to  building 
synthetic  bottlenose  dolphin  whistle  calls  and  examines  different  approaches  to  wa¬ 
termarking  synthetic  whistles.  Chapter  5  presents  data  from  a  shallow  water  ocean 
experiment  testing  watermarked  chirps  and  synthetic  whistle  calls.  Finally,  Chapter  6 
closes  with  conclusions  and  indicates  future  directions  for  research. 
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Chapter  2 


Sinusoidal  Modeling  Using  Linear 
Prediction 


The  term  Unear  prediction  as  a  method  for  time  series  analysis  dates  back  to  Wiener 
in  1949  [41,  74].  Since  then,  it  has  been  widely  applied  in  many  fields  for  the  modeling, 
parameterization,  prediction,  and  control  of  dynamic  systems  and  signals  [42],  and 
has  been  used  in  speech  analysis  and  synthesis  since  1966  [41].  Generally,  the  work 
focuses  on  discrete  stochastic  models  of  autoregressive  (AR)  systems  whose  value  at 
any  point  in  time  is  a  linear  combination  of  a  finite  number  of  past  samples  plus 
additive  noise.  Signals  are  parameterized  in  the  linear  prediction  or  autoregressive 
coefficients,  and  can  then  be  synthesized  by  driving  a  corresponding  all-pole  filter 
with  white  noise  [15,  21,  42].  Spectral  estimation  is  performed  by  fitting  an  AR 
model  to  the  data’s  autocorrelation  sequence  and  transforming  into  the  frequency  do¬ 
main.  Although  it  is  not  a  spectral  estimation  technique,  Prony’s  method  has  a  close 
relationship  to  the  least  squares  linear  prediction  algorithms  used  for  AR  parameter 
estimation.  In  contrast  to  AR  methods  that  seek  to  fit  a  random  model  to  the  second 
order  statistics,  the  modern  version  of  Prony’s  method  seeks  to  fit  a  deterministic  ex¬ 
ponentially  damped  sinusoidal  model  to  the  data  [43].  Based  on  the  sustained  tonal 
characteristic  of  a  marine  mammal  whistle  call,  applying  a  deterministic  sinusoidal 


27 


model  is  an  intuitive  starting  point  for  estimating  whistle  call  frequency  contours. 


2.1  Prony’s  Method 


Gaspard  Riche,  Baron  de  Prony’s  paper  [14,  43]  proposed  in  1795  a  method  for  exactly 
fitting  damped  exponentials  to  available  data  points  for  his  research  on  the  expansion 
of  various  gases.  The  modern  form  of  Prony’s  method  generalizes  to  identifying  the 
amplitudes  Ak,  damping  factors  ak,  sinusoidal  frequencies  /*.,  and  initial  phases  9k  of 
a  linear  combination  of  complex  exponentials, 

p 

x[n]  =  exP[(a*  +  j2n  fk)(n  -  1  )T  +  j6k]  (2.1) 

k= 1 

for  1  <  n  <  p,  where  T  is  the  sample  interval.  In  the  case  of  real  data,  the  complex 
exponentials  must  occur  in  complex  conjugate  pairs  of  equal  amplitude,  reducing 
Eq.  (2.1)  to 

p 

x\n)  =  2Ak  exp [(ak{n  -  1  )T]  cos[2 nfk(n  -  1  )T  +  6k]  .  (2.2) 

k= i 

Eq.  (2.1)  can  be  written  in  the  form 

p 

x[n]  =  Y2  hkzk~l  i  (2-3) 

k=l 

where  the  complex  constants  hk  and  zk  are  defined  as 


hk  =  Akexp(jdk)  , 
zk  =  exp[(oA.  +  j2nfk)T] 
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(2.4) 

(2.5) 


Expressing  Eq.  (2.3)  in  matrix  form  as  a  set  of  simultaneous  equations  for  1  <  n  <  p 
results  in 


Prony  discovered  a  method  to  separately  solve  for  the  exponential  Zk  elements,  from 
which  Eq.  (2.6)  can  then  be  solved  for  the  vector  of  unknown  constants  /?,*..  Ap¬ 
pendix  A  shows  that  Eq.  (2.3)  is  the  solution  to  a  homogeneous  constant-coefficient, 
difference  equation 

p 

w[m]x[n  —  m]  =  0  ,  (2.7) 

m— 0 

where  w[m]  are  the  coefficients  of  the  polynomial  <f>(z)  with  roots  Zk, 

p  p 

0(~)  =  II(*  ~  Zk )  =  ~P  +  w[m]zp~m  .  (2.8) 

k=l  m—  1 

The  p  equations  for  which  Eq.  (2.7)  is  valid,  p  +  1  <  n  <  2 p,  can  be  expressed  in 
matrix  form  as 


x[p]  x[p  —  1]  ...  x[l]  10  [1] 

x\p+l]  x[p]  ...  x[2]  w[  2] 

x[2p  -  1]  x[2 p  -  2]  ...  x\p\  w\p ] 


x[p  +  1] 
x\p  +  2] 

x\2p\ 


(2.9) 


Prony’s  method  to  fit  p  exponentials  to  2 p  data  points  can  be  summarized  in  three 
steps.  First,  Eq.  (2.9)  is  solved  to  determine  the  coefficients  of  the  polynomial  <p(z) 
in  Eq.  (2.8).  Second,  the  roots  z*.  of  (p{z)  are  calculated.  Third,  Eq.  (2.6)  is  solved  to 
determine  the  parameters  /p. . 
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The  desired  parameters  are  then  determined  by  the  relationships 


ak  =  \n\zk\/T 

(2.10) 

fk  =  tan-1[Im{2fc}/  Re{zk}}/2nT 

(2.11) 

II 

(2.12) 

Bk  =  tan  1[Im{/ifc}/Re{fifc}]  • 

(2.13) 

2.2  Least  Squares  Prony  Method 

In  practical  situations,  the  presence  of  some  noise  in  the  data  sequence  prevents 
obtaining  an  exact-  exponential  fit,  to  the  data,  so  the  number  of  data  points  N  usually 
exceeds  the  2 p  data  points  used  in  the  original  Prony  method.  In  this  overdetermined 
case,  the  data  is  approximated  as  an  exponential  sequence, 

x{ri]  =  .  (2.14) 

1 

for  1  <  n  <  N,  with  observation  error  e[n]  =  x[n]  —  x[n].  Applying  standard  linear 
least  squares  (LS)  procedures  [19]  to  the  original  Prony  method  results  in  the  three- 
step  LS  Prony  method.  First,  forming  the  linear  prediction  relation 


Aw  «  b  ,  (2-15) 


x\p] 

x{p  ~  1]  . 

*[!] 

“li] 

1 

H 

+ 
i — i 

.  j 

A  = 

x[p  -f  1] 

x\p] 

x[2] 

,  w  = 

»[2] 

,  and  b  =  — 

x[p  +  2] 

x[N  -  1] 

x[N-2]  .. 

i 

1 

H 

9 

x[N] 

the  LS  solution  is  given  by 


wLS  =  (Aff  A)_1A"b 


(2.16) 
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Second,  the  roots  zk  of  <f>(z )  in  Eq.  (2.8)  are  calculated.  Third,  the  LS  solution  for 
the  parameters  hk  is  given  by 


hLS  =  (Zi/Z)-1Z77x  ,  (2.17) 

where 


1 

1 

1 

hi 

l - 

t-H 

1 _ 

Zi 

Z<1 

Zp 

.  h  = 

h2 

,  and  x  = 

*[2] 

z?-1 

Uv-i 

"2 

*  5; 

1 

hp 

a*  [A] 

Unfortunately,  the  LS  Prony  method  doesn’t  perform  well  in  the  presence  of  signifi¬ 
cant  additive  noise  because  it  assumes  the  data  matrix  A  is  error  free  and  models  the 
observation  error  in  b  as  white  noise.  Different  methods  that  have  been  used  to  im¬ 
prove  the  performance  of  the  Prony  method  include  employing  high  prediction  orders 
and  reduced  rank  approximations  of  the  data  matrix  via  singular  value  decomposition 
(SVD)  [31,  43,  68,  69].  The  higher  prediction  order  improves  the  estimation  of  signal 
parameters  by  adding  extra  exponentials  to  model  the  additive  noise.  The  poles  zk 
related  to  the  true  signal  exponentials  cluster  closer  to  their  correct  values,  while  the 
extraneous  poles  fluctuate  widely  to  account  for  the  noise.  The  noise  contribution  to 
the  data  matrix  A  can  be  reduced  by  using  its  reduced  rank  approximation 

AK  =  VKY,KV”  (2-18) 

composed  of  the  largest  K  singular  values  and  singular  vectors  of  A,  where  K  is  the 
number  of  signal  exponentials,  and 


A  =  USV" 


(2.19) 
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where 


U=  [ul5  uw_p],  ut  G  Rn~p, 

V  =  [Vl5  .  .  .  ,  Vp],  V;  £  Rp, 

£  —  diagfcrj,  ....  Up),  ^  ^  ^min(Ar— p,  p)  > 

is  the  SVD  of  A  with  UWU  =  I N_p  and  VHV  =  Ip.  The  principle  eigenvector  (PE) 
method  developed  by  Tufts  and  Kumaresan  [68,  69]  uses  both  a  high  prediction  order 
and  the  reduced  rank  approximation  of  Eq.  (2.18)  to  improve  Prony’s  method  in  the 
presence  of  noise.  More  recent  work  has  applied  a  modified  LS  Prony  method  to  the 
frequency  estimation  problem  [25,  39,  66]. 


2.3  Total  Least  Squares  Approach 

In  the  classical  LS  problem  of  Eq.  (2.15),  there  is  an  underlying  assumption  [18]  that 
all  of  the  errors  are  confined  to  the  vector  b,  i.e.,  that  the  data  matrix  A  has  no  errors. 
Since  both  A  and  b  contain  values  from  the  data  sequence  x[n ]  for  1  <  n  <  N,  errors 
in  b  will  also  appear  in  A.  The  total  least  squares  (TLS)  method  [18,  73]  compensates 
for  error  in  both  A  and  b,  and  should  be  expected  to  give  a  better  solution  than 
Eq.  (2.15). 

2.3.1  Solution  to  the  Total  Least  Squares  Problem 

A  good  way  to  motivate  the  TLS  method  is  to  state  the  ordinary  LS  problem  as 
follows: 


minimize  I!  Ab||2  (2.20) 

Ab  e 

subject  to  b  +  Ab  e  Range(A) 
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where  ||  ■  ||2  denotes  the  l2  norm  given  by 


Ab||2  = 


(2-21) 


The  LS  problem  amounts  to  perturbing  the  observation  b  by  a  minimum  amount  Ab 
so  b  +  Ab  can  be  predicted  by  the  columns  of  A.  The  TLS  problem  accounts  for 
perturbation  in  both  b  and  A,  i.e., 


(A  +  AA)  w  =  b  +  Ab 


(2.22) 


or  expressing  Eq.  (2.22)  in  a  different  form, 


([A  b 


+ 


AA  Ab 


w 

-1 


=  0 


or 


(C  +  AC)  z  =  0 


(2.23) 


where 


r  i 

r  i 

w 

c  = 

A  b 

,  AC  = 

AA  Ab 

,  and  z  = 

-1 

The  TLS  problem  seeks  to 


minimize  ||AC||f  (2.24) 

AC  €  K^-pSxfp+D 

subject,  to  (b  +  Ab)  €  Range(A  +  AA) 
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where  ||  •  \\p  denotes  the  Frobenius  norm  given  by 

I|AC||f  =  JY,  IM/  .  (2.25) 

y  i.j 

Eq.  (2.23)  shows  that  the  TLS  problem  involves  finding  a  perturbation  matrix  AC  €E 
^(N-P)x(p+i)  iiavjng  minimum  norm  such  that  C  +  AC  is  rank  deficient.  The  SVD 
can  be  used  for  this  purpose.  Let 

C  =  UEVW  (2.26) 

where 

U  =  [ui,  •  ••,  Ujv-J,  Uj  €  Ra“p, 

V  =  [vl5  ...,  vp+1],  vt  G  Mp+1, 

S  =  diag(<7!,  . . . ,  (Jp+ 1),  aj  >  ■  ■  ■  >  ok  >  ak+1  =  ■■■  =  crp+l  >  0, 

be  the  SVD  of  C  with  UHU  =  I^_p  and  VHV  =  Ip+i.  It  is  assumed  here  that  the 
problem  is  overdetermined,  i.e.,  N  >  2 p.  Two  cases  arise  in  the  TLS  solution.  In  the 
first  case,  when  ap  >  crp+i,  a  unique  solution  exists.  The  solution  can  be  thought  of 
as  finding  a  matrix  (C  +  AC)  of  rank  p  that  satisfies  Eq.  (2.24).  A  reduced  rank 
approximation  to 

p+  i 

c  =  Oj  Ujvf  (2.27) 

is  obtained  by  removing  one  or  more  ol  terms  from  Eq.  (2.27).  The  smallest  pertur¬ 
bation  AC  that  reduces  the  rank  of  C  by  one  is 

AC  =  ~(Jp+lup+1v^+1  .  (2.28) 
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Inserting  Eq.  (2.28)  into  Eq.  (2.23)  yields  z  =  avp+1,  since  vp+1  is  now  in  the  nullspace 
of 

p 

(C  +  AC)  =  a'u<v"  ■  (2.29) 

i=l 

Thus,  provided  (vp+1)p+1  ^  0,  the  TLS  solution  is  given  by 


WTLS  = 


-1 

(vp+l)p+l 


(Vp+l)l 

(vp+l  )p 


(2.30) 


The  TLS  solution  does  not  exist  if  (vp+J)p+]  =  0,  but  this  doesn’t  commonly  arise 
in  engineering  applications.  In  the  second  case,  when  ov  =  <rp+1,  a  solution  may 
still  exist,  but  it  is  not  unique.  However,  a  unique  solution  is  chosen  in  the  sense  of 
minimum  norm  [18,  73]. 

An  alternative  expression  for  the  TLS  solution  wTLg  in  Eq.  (2.30)  can  be  derived 
as  follows. 


C"Cvp+1  =  (V£Uw)(U£V")vp+1 
=  (V£2V")vp+] 

=  ^p+iVp+i  •  (2.31) 


Inserting  C  = 


A  b 


and  vp+1 


vp+  i 

(vp+i  )p+i 


into  Eq.  (2.31)  gives  the  expression: 


AHA  AHb 

vp+i 

2 

VP+1 

bHA  b"b 

(vp+i  )p+i 

ap+ 1 

(vp+l  )p+3 

Expanding  Eq.  (2.32)  gives  the  set  of  equations, 


(2.32) 
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{AhA  -  cTp+1I p)Vp+1  +  (vp+1)p+iA"b  =  0 
b/7AVp+1  +  (b7/b  -  ^+1)(vp+1)p+1  =  0 


(2.33) 

(2.34) 


But  if  (vp+])p+1  ^  0,  wTLg  =  ■■■"  ^jy  -4  i  so  Eq.  (2.33)  reduces  to 

(AhA  -  a2p+i Ip)wTLS  =  AHb  .  (2.35) 

If  (AHA  —  <jp+1Ip)  is  invertible,  the  alternative  expression  for  the  TLS  solution  is 

wTls  =  (A" A  -  (Jp+1  Ip) _1AHb  .  (2.36) 

2.3.2  Prony’s  Method  and  Total  Least  Squares 

The  TLS  solution  wTLs  is  the  maximum  likelihood  (ML)  estimate  for  Eq.  (2.15) 
only  if  the  errors  in  C  =  A  b  are  independently  and  identically  normally  dis¬ 
tributed  with  common  covariance  matrix  proportional  to  the  identity  matrix  with 
zero  mean  [35,  73].  Due  to  the  Toeplitz  structure  of  the  matrix  A,  the  errors  are  not 
independently  distributed,  so  the  TLS  solution  is  not  optimum.  However,  the  TLS 
solution  does  tend  to  reduce  the  effects  of  noise  in  the  linear  prediction  formulation, 
and  provides  improvements  over  the  LS  solution.  Rahman  and  Yu  [56]  applied  the 
TLS  method  to  the  linear  prediction  frequency  estimation  problem  and  demonstrated 
better  performance  than  the  LS-based  principal  eigenvector  (PE)  method  [69]  for  the 
same  prediction  order.  The  TLS  method  yielded  the  greatest  improvement  relative 
to  the  PE  method  at  minimal  prediction  orders,  although  both  solutions  improve 
with  higher  prediction  orders.  As  the  prediction  order  is  increased,  additional  corre¬ 
lated  errors  are  added  to  the  matrix  C,  reducing  the  benefit  of  the  TLS  method.  At 
maximal  prediction  order,  with  p  =  A  for  even  A,  both  the  TLS  and  PE  solutions 
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converge  to  the  same  performance. 

The  matrix  Z  in  Eq.  (2.17)  used  in  the  third  step  of  the  Prony  Method  for  deter¬ 
mining  the  parmeters  /q.  has  a  Vandermonde  structure  [19].  Assuming  that  relatively 
good  estimates  are  available  for  the  system  poles  Zk ,  the  major  source  of  error  will 
be  in  the  observation  vector  x.  Thus,  the  LS  solution  of  Eq.  (2.17)  appropriately 
accounts  for  errors  in  the  model. 


2.4  Structured  Total  Least  Squares  Approach 


Structured  Total  Least  Squares  (STLS)  is  a  natural  extension  to  the  TLS  approach 
when  the  same  observations  occur  in  multiple  rows  of  the  matrix  C  in  Eq.  (2.23).  In 


order  to  find  an  ML  estimate  of  w, 


AA  Ab 


needs  to  have  the  same  structure  as 


A  b 


[1].  This  leads  to  the  following  formulation  of  the  STLS  problem  [35]: 


minimize 

A  A,  Ab,w 


AA  Ab 


(2.37) 


A' 


such  that(A  +  AA)w  =  (b  +  Ab), 


and 


AA  Ab 


lias  the  same  structure  as 


A  b 


Many 


where  ||  •  ||*  denotes  the  l2  norm  defined  on  the  unique  entries  of  j^AA  Ab 
different  formulations  have  been  proposed  for  the  STLS  problem  involving  linearly 
structured  matrices:  the  Constrained  Total  Least  Squares  (CTLS)  approach  [1],  the 
Structured  Total  Least  Norm  (STLN)  approach  [60,  72],  and  the  Riemannian  Singular 
Value  Decomposition  (RiSVD)  approach  [13].  Each  approach  uses  iterative  numerical 
algorithms  to  find  the  solution,  but  all  of  them  suffer  from  inherent  multiple  local 
minima  that  occur  in  the  STLS  problem  [34].  When  the  noise  level  is  relatively  low, 
the  STLS  problem  is  not  difficult  to  solve,  and  simple  starting  values  will  suffice. 
However,  when  STLS  is  used  for  its  rank  reducing  properties  and  there  is  not  a 
solution  nearby  in  an  l2  norm  sense,  the  starting  values  need  to  be  chosen  with  care. 
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2.4.1  STLS  Solution  for  Hankel/Toeplitz  Matrices 


The  linear  prediction  relation  of  Eq.  (2.15)  can  be  written  with  a  Hankel  structure 
by  reordering  the  columns  of  matrix  A  and  reversing  w: 


Aw  rs  b  , 

z[l] 

x{2] 

x[p] 

*[2] 

i|3] 

x\p  +  1] 

[N-p] 

x[N  —  p  +  1]  . . 

.  x[N  -  1] 

w\p] 

i 

+ 
i — * 

i 

w[p  —  1] 

,  and  b  =  — 

x\p  +  2] 

.  w[l] 

(2.38) 


so  that  C  =  A  b  has  a  Hankel  structure.  The  solution  w  is  then  reversed  for 
determining  the  poles  Zk  in  Step  2  of  the  Prony  method.  The  Hankel  STLS  problem 
can  be  solved  using  the  Hankel  STLN  formulation: 


TV 


minimize  V(Ax[n])' 

Ax,  w  — / 


(2.39) 


n— 1 


such  that  (A  +  AA)w  =  (b  +  Ab), 


and 


AA  Ab 


has  a  Hankel  structure, 


where  A x[n]  for  1  <  n  <  N  are  the  unique  entries  of  the  Hankel  matrix  AA  Ab 
The  STLN  approach  solves  the  STLS  problem  as  a  nonlinear  optimization  problem 
with  nonlinear  constraints  [60,  72],  Lemmerling  and  van  Huffel  [35]  propose  the  fol¬ 
lowing  STLN  algorithm  for  solving  Eq.  (2.39): 
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STLN  Algorithm 

Input:  extended  Hankel  data,  matrix  [A  b]  G  ]Rmx(n+1)  (m  >  n)  of  full  rank  n  +  1 
and  identity  weighting  matrix  lm+n 

Output:  the  parameter  vector  w  €  RnXl  and  vector  Ax  G  R(m+n)xl  composed  of 
the  unique  entries  of  the  matrix  [AA  Ab] 


Step  1:  Initialize  Ax,  w,  and  Lagrange  multiplier  vector  7  G  Rmxl 
Step  2:  While  stop  criterion  not  satisfied 

Step  2.1:  Solve  the  following  system  of  equations: 


r  -1 

Ax 

r  1 

S  Jr 

Aw 

—  — 

g  +  J7  7 

J  0 

r(  Ax,  w) 

L_ 

A7 

Step  2.2:  Ax  <—  Ax  +  Ax 

w  <—  w  +  Aw 

7^-7  +  A  7 


End 


where  S 


Im+n  0 

G  ]]j(m+2n)x(m+2n)  j  _ 

W  A  +  AA 

1 — 

0 

0 

J 

G  R™x<™+2")  is  the 


Jacobian  of  the  constraints  r(Ax,  w)  in  Eq.  (2.39), 


Im+n  0 

Ax 

0  0 

Aw 

r(  Ax,  w)  =  (A  +  AA)w  —  (b  +  Ab)  , 

G  R(m+2n)xl  is  the  gradient  of  the  objective  function  in 
Eq.  (2.39),  and  W  G  Rmx(™+«)  is  defined  by 

[  w 

WAx  =  [AA  Ab] 

'  -1 
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which  for  the  Hankel-structured  matrix  [AA  Ab]  has  the  form 

w\p]  ...  w[l]  —1  0  0 

0  w[p]  ...  w[l]  —10  : 

w  ;  ; 

o  o 

0  .  0  w\p]  ...  ty[l]  —1 

The  stop  criterion ,  chosen  based  on  the  application,  tests  for  convergence  of  the  STLN 
algorithm.  With  each  iteration,  the  algorithm  updates  parameter  estimates  for  Ax 
and  w  in  an  attempt  to  drive  the  constraint  r(  Ax.  w)  to  zero.  If  the  iterative  solution 
approaches  close  to  one  of  many  local  minima,  the  algorithm  will  not  converge  to  the 

actual  STLS  solution.  The  system  of  equations  in  Step  2.1  is  solved  using  the  LDLT 

S  JT 

factorization  of  the  matrix 

J  0 

A  natural  choice  for  the  initialization  parameters  in  the  STLN  algorithm  would  be 
to  set  A  ^initial  =  0,  7  =  0,  and  wmitml  =  wLS  or  wTLS.  It  turns  out  that  winitial  = 
wls  is  the  better  choice,  but  both  take  a  large  number  of  iterations  for  the  STLN 
algorithm  to  converge  to  a  solution  that  is  often  a  local  minima.  Lemmerling  et  al.  [34] 
propose  a  better  initialization  procedure  based  on  the  Hankel  Total  Least  Squares 
(HTLS)  subspace  algorithm  developed  for  Nuclear  Magnetic  Resonance  (NMR)  data 
fitting  [71].  The  HTLS  algorithm  is  suboptimal  in  the  sense  that  while  it  gives  a 
good  estimate  of  the  solution,  it  is  not  the  closest  rank-deficient,  Hankel  matrix  to 
A  b  •  The  STLN  algorithm  is  then  initialized  close  to  the  global  solution  for  the 
STLS  problem  using  the  values  Ax,mtlfl(  =  AxHtls,  7  =  0,  and  winitial  =  wHtls- 
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HTLS  Algorithm  Description 


The  HTLS  algorithm  [70]  is  based  on  the  fact  that  Eq.  (2.14)  can  be  modeled  by  an 
autonomous  linear  state-space  model  of  order  p, 


y[n  +  1]  =  By[n] 

x[r>]  =  h7y[n]  +  e[n], 


(2.40) 


where  y[n]  is  a  complex  state  vector,  h7  is  a  complex  row  vector,  and  rr[n]  are  noisy 
observations  with  observation  error  e[n]  =  .r[n]  —  x[n].  Equating  Eq.  (2.14)  and 
Eq.  (2.40)  for  1  <  n  <  N  yields 

x[n]  =  h7 B— xy[l]  =  hkzl~\  (2.41) 

fc=i 


where  x[n]  has  zero  observation  error,  and  defines 


Essentially,  the  modern  Prony  method  is  described  in  a  state-space  model  which  is 
used  to  estimate  the  parameters  Zk  and  /q..  A  Hankel  matrix  H  G  R(LxA1\  as  square 
as  possible  for  best  parameter  accuracy  [71],  such  that  L  =  M(+ 1)  ~  N/2,  is  formed 
using  the  N  data  points, 

x[l]  x[2]  . . .  x[M] 

x[2]  x[3]  . . .  x[M  +  1] 

x[L]  x[L  +  1]  ...  x[N] 
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If  the  observation  error  e[n]  is  zero,  H  decomposes  into  an  observability  matrix  O 
and  a  controllability  matrix  C  given  by: 


H  =  OC  = 


hr 

hTB 


y  [1]  By  [1]  ...  BM~1y[l] 


h7BL-1 


(2.42) 


In  practice,  the  observation  error  in  Eq.  (2.41)  is  non-zero.  Hp,  the  SVD  reduced-rank 
approximation  of  H,  is  computed  as 


H 


p 


UP£„V" 


(2.43) 


where 


H  LxM  ~  U  LxL^LxM^MxM  • 

S  diag(<7i ,  .  .  .  ,  <7rnin(L,A/ ) )  1  ^  ^  ^min (L,M) : 

and  Up,  and  Vp  are  the  first  p  columns  of  U,  £,  and  V.  Hp  is  used  to  estimate 
O  and  C  up  to  a  similarity  transformation  matrix  S, 

Hp  =  TJpVp  ((9S~1)(SC)  ,  (2.44) 

^  A  l 

where  Up  =  Up  and  Vp  =  \pT,p  if  unbalanced  splitting  is  used,  and  Up  =  U PEP  and 

A  \ 

Vp  =  VpEp  if  balanced  splitting  is  used.  Substituting  B  =  S_1QS  into  Eq.  (2.44), 
where  Q  =  SBS^1  has  the  same  eigenvalues  as  B,  yields: 
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H  p  =  UPV" 


hrs-] 

t^S^Q 


hTS~lQL*1 


Sy[l]  QSy  [1]  ...  QM-1Sy[l] 


(2.45) 


The  TLS  solution  Qtls  is  computed  for  the  incompatible  set 


UpQ  ~  Up  ,  (2.46) 

where  Up  and  Up  are  derived  from  Up  by  omitting  its  first  and  last  row, 


hTS_1Q 

I 

13- 

in 

\ 

_ 1 

p 

II 

hTS-]Q2 

and  Up  — 

hTS-1Q 

hrs-iQL-2 

Provided  V22  is  non-singular,  the  TLS  solution  is  given  by 


Qtls  —  ~  V]2(V 


22 ) 


(2.47) 


in  which  V12  and  V22  are  obtained  from  the  SVD  of  the  augmented  matrix 


tH 


—  U(L_i)x(L-pSV2px2p 


(2.48) 
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where 


~  _  Vn  V12  P 

V21  V22  p  . 

P  P 

If  V22  is  close-to  singular  in  Eq.  (2.47),  it  is  replaced  by  its  pseudo-inverse  V^.  The 
system  pole  estimates  zk  are  equal  to  the  eigenvalues  of  Qtls-  It  is  not  necessary  to 
find  the  similarity  transformation  matrix  S.  Finally  the  parameter  estimates  hk  are 
obtained  by  inserting  the  pole  estimates  zk  into  Eq.  (2.17), 

hLS  =  (ZHZ)-1ZHx  .  (2.49) 

STLS  Initialization  using  HTLS 

Once  the  estimates  zk  and  hk  are  obtained  using  the  HTLS  Algorithm  with  unbalanced 
splitting  in  Eq.  (2.44),  the  resulting  adjusted  data  values  are  calculated  as 

p 

(x[n]  +  A  XffTLs[n})  =  hkzl~l  ,  (2.50) 

k=  1 

from  which  the  initial  values  for  AAhtls,  AbnTLS,  and  whtls  in  Eq.  (2.39)  are 
found. 

HTLS  Algorithm 

Input:  extended  Hankel  data  matrix  [A  b]  G  Mmx^n+1i  (m  >  n)  of  full  rank  n  +  1 
Output:  extended  Hankel  noise  data  matrix  [AAHtls  AbHTLs]  and  parameter 
vector  whtls.  such  that  [A  +  AAhtls  b  +  AbHTLs]  is  a  rank-deficient  Hankel 
matrix. 

Step  1:  y  <—  [A(:,  l)7  A(m,  2  :  n)  b(m)]T 
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Step  2:  M  <—  ceil((m  +  n)f 2) 

Step  3:  H  -f—  hankel( y(l  :  m  +  n—  M  +1),  y (m  +  n  —  M  +  1  :  m  +  n)) 

Step  4:  Calculate  the  left  singular  vectors  U(:,i),  i  —  1, . . . ,  n  of  H, 
corresponding  to  the  n  largest  singular  values 
Step  5:  Calculate  the  TLS  solution  of  the  system 
U(2  :  M,  1  :  n)Q  ~  U(1  :  M  -  1, 1  :  n). 

The  eigenvalues  of  Q  are  the  estimated  signal  poles  zu  l  =  T  . . .  ,n 
Step  6:  Solve  the  complex  amplitudes  /p,  l  =  1.. . . ,  n,  from  the  system  of  equations: 

t/(fc)  ~  /qif,  fc  =  1, . . . ,  m  +  n 
Step  7:  y(k)  +-  JZILi  A:  =  1, . . . ,  m  +  n 

Step  8:  [AAhtls  AbHTLs]  hankel( y(l  :  m),  y(m  :  m  +  n))  -  [A  b] 

Step  9:  Solve  the  square  system 

(A(l  :  n,  1  :  n)  AA(1  :  n,  1  :  n))wHTLS  =  b(l  :  n)  +  Ab(l  :  n) 


The  STLN  algorithm  is  then  initialized  using  Axhtls  and  wHXLS-  The  improved 
initialization  procedure  enhances  both  the  solution  quality  and  calculation  time  by 
starting  the  iterative  search  routine  close  to  the  global  minimum  for  the  Hankel 
STLS  problem  [34].  Lemmerling  et  al.  [33]  demonstrated  the  improved  accuracy  of 
the  STLN  algorithm  using  HTLS  parameter  initialization  in  a  speech  compression  ap¬ 
plication.  Even  with  the  improved  HTLS  initialization  procedure,  the  computational 
load  of  the  STLN  algorithm  is  large  compared  to  standard  speech  coding  algorithms. 
Various  methods  have  been  used  to  produce  faster  STLS  algorithms  [44],  but  current 
algorithms  are  are  still  too  slow  for  real-time  application. 
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Chapter  3 


Simulation  Results 


As  described  in  Chapter  1.  the  different  techniques  for  modeling  acoustic  signals  based 
on  the  sinusoidal  superposition  model  of  Eq.  (1.1)  differ  primarily  in  the  method 
by  which  the  interpolation  of  amplitude  and  phase  contours  is  performed  between 
analysis  blocks.  Frequency  estimation  is  generally  performed  by  taking  the  Fourier 
transform  (DFT)  of  a  windowed  block  of  data  of  length  N  samples,  with  N  =  512 
being  common  in  practice,  although  some  algorithms  adaptively  vary  N.  Different 
windowing  functions  are  used  to  provide  better  spectral  peak  estimation  performance. 
The  data  sample  advance  between  analysis  blocks,  known  as  the  hop  size  FT,  is  usually 
chosen  to  have  some  overlap  between  blocks  to  produce  smoother  results  across  time 
at  the  expense  of  higher  computational  loading  [64],  Choosing  H  =  1  is  generally 
not  used  since  parameters  are  assumed  to  be  slowlv-varying  and  accumulation  of 
excess  data  is  not  desirable  [17].  In  most  applications,  data  storage  is  an  important 
design  criteria,  and  while  optimal  synthesis  quality  is  desired,  some  amount  of  signal 
compression  is  acceptable. 

In  the  case  of  modeling  marine  mammal  whistle  calls,  computational  loading  and 
signal  compression  is  not  a  design  criteria  in  generating  high-quality  synthesis  mod¬ 
els.  Since  the  frequency  contours  of  a  marine  mammal  whistle  call  vary  with  time,  a 
method  of  closely  tracking  the  frequency  contour  is  desired  to  improve  the  synthesis 
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quality.  This  is  achieved  by  using  a  hop  size  of  H  =  1  and  reducing  the  effective 
window  size  by  applying  the  parametric  approach  of  linear  prediction  to  estimate  the 
instantaneous  frequency.  Based  on  the  harmonic  structure  of  marine  mammal  whis¬ 
tle  calls,  estimation  of  the  fundamental  frequency  contour  should  provide  adequate 
estimates  of  higher  harmonics,  as  assumed  in  [5].  In  a  communications  scenario, 
good  frequency  tracking  performance  is  desired  even  at  relatively  low  SNR  to  ensure 
capability  of  reconstructing  the  embedded  watermark. 

The  rest  of  this  chapter  proceeds  as  follows.  The  algorithm  for  tracking  the  funda¬ 
mental  frequency  contour  and  amplitude  contours  using  weighted  STLS  and  Prony’s 
method  is  described.  Simulation  results  are  presented  for  tracking  frequency  con¬ 
tours  of  chirp  signals  with  constant  amplitude,  and  are  compared  to  other  frequency 
estimation  methods.  Finally,  simulation  results  are  presented  for  tracking  both  the 
frequency  and  amplitude  of  a  chirp  signal  with  variable  amplitude  harmonics. 


3.1  Algorithm  Description 


The  algorithm  applies  a  sliding  block  window  of  size  M  samples  to  a  harmonically 
structured  whistle  recording  s[n],  where  p  =  2 R  is  the  model  order,  R  is  the  number 
of  harmonics  in  s[n],  and  M  —  p  is  the  number  of  linear  prediction  equations  used  to 
estimate  the  AR  parameters  of  s[n].  Thus,  s[n]  is  modeled  as 


s\n\ 


y~"  ar[n ]  cos  I  27r <f>r[n]  +  6r  j  +  v[n],  for  1  <  n  <  N, 

r- 1  '  ' 


(3-1) 


or  explicitly  writing  each  exponential  component, 


R 


snl 


E 


ar\n\ 


exp  (jf)r) 


exp 


yj2n<f>r 


n 


+  exp 


j2,lT(f)r 


n 


+  v\n\ 


(3.2) 
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where  fr[n]  is  the  instantaneous  frequency  of  the  rth  harmonic  at  time  n  such  that 

n 

0r[n]  =  fr[i}/ fs  ,  (3.3) 

i- 1 

aT[n]  is  the  amplitude  of  the  rth  harmonic  at  time  n,  6r  is  the  initial  phase  of  the  rth 
harmonic,  fs  is  the  sample  rate,  and  v [n]  is  additive  ambient  noise.  The  /th  analysis 
block,  using  a  hop  size  of  H  =  1,  is  expressed  as 

xi[m]  =  W[m]s[m  +  l  —  1],  (3.4) 

for  1  <  l  <  L  =  N  -  M  +  1, 
and  1  <  m  <  M, 

where  W \rn] ,  discussed  on  page  51,  is  a  window  of  length  M  applied  to  the  data. 
Setting  up  the  first  step  of  Prony’s  method  using  the  Hankel  structure  in  Eq.  (2.38) 
gives 


A/W;  «  b;  , 

Xi[  1]  Xi[  2] 

Xi[  2]  Xi[  3] 

Xi  [M  —  p]  xi  [M  —  p  +  1] 


xi\p\ 

Xi\p+l] 

xi[M  -  1] 


Wi[p] 

xi\p+  1] 

Wi[p  -  1] 

,  and  b;  =  — 

xi[p  +  2] 

n[i] 

_  xt[M]  _ 

(3.5) 


Eq.  (3.5)  is  solved  using  the  STLS  method  if  v[n]  ^  0,  but  in  simulations  where 
v[n]  =  0,  the  LS  method  is  sufficient.  The  system  pole  estimates  Zk,i  are  then  found 
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as  the  roots  of  the  polynomial 


p 

4>i(z)  =  zp  +  wi[k]zp~k  ,  (3.6) 

k=l 

keeping  in  mind  that  w;  is  written  in  reverse  order  when  A /  has  a  Hankel  structure. 
In  the  presence  of  noise,  the  poles  zki  fluctuate  back  and  forth  across  the  unit  circle  as 
the  analysis  block  x;  moves  through  the  data,  giving  a  better  frequency  estimate  than 
if  the  poles  were  constrained  to  be  on  the  unit  circle.  However,  the  underlying  model 
in  Eq.  (3.1)  assumes  that  the  original  dolphin  whistle  has  an  undamped  sinusoidal 
structure,  so  only  the  frequency  component 


fk,l 


Is  f  _! 
—  tan 
27 r 


U  e{Zkj}J 


(3.7) 


is  retained  while  scaling  the  pole  estimates  to  the  unit  circle,  i.e. , 


-k,i  — 


~k,i 

?kj  I 


(3.8) 


In  the  STLN  formulation  [34],  the  HTLS  algorithm  is  used  to  initialize  the  iterative 


A,  b, 


However,  simulation 


search  for  the  closest  rank-deficient.  Hankel  matrix 
results  show  that  both  the  STLN  [35]  and  extended  structured  least  squares  (ES- 
TLS)  [75]  algorithms  do  not  improve  upon  the  frequency  estimate  fkj  provided  by 
the  HTLS  algorithm.  Thus,  the  poles  zkj  are  found  as  the  normalized  eigenvalues  of 
the  matrix  Qtls,;  (Eq.  (2.47)).  The  pole  estimates  zk,i  are  then  used  in  Step  3  of  the 
Prony  method  to  calculate  the  parameters  hkj  using  Eq.  (2.17), 


h,  -  (Zf  %) 


~lZH 


i 


(3.9) 
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where 


1 

1 

1 

~hu 

X/[l] 

Z,  = 

Z\J 

hi  ■ 

•  zp,l 

,  h  i  = 

hi 

,  and  Xi  = 

xi  [2] 

-A/-1 

_~1,Z 

~M- 1 
z2,l 

-A/-1 

~P,l 

hp-i 

x/[  M] 

The  least  squares  estimate  of  the  amplitude  of  the  Arth  harmonic  exponential  is 

Ak,i  =  |M  ,  (3.10) 


meaning  that  for  each  analysis  block,  the  amplitudes  are  chosen  to  minimize  the 
residual  mean  square  error  (MSE)  between  the  sinusoidal  model  and  the  observed 
data. 

An  important  aspect  of  this  approach  is  selecting  the  window  W[m]  and  measuring 
the  corresponding  estimation  delay  between  the  leading  edge  of  the  analysis  window 
and  the  effective  estimation  point  of  the  algorithm.  Since  there  is  not  currently  a 
recursive  implementation  of  the  STLS  method,  the  type  of  window  is  restricted  to  a 
constant-length  analysis  of  the  data,  known  as  a  sliding  window  approach.  In  general, 
the  window  that  is  chosen  is  an  exponential  sliding  window, 


W[m  = 


A M~m  1  <  m  <  A/,  where  0  <  A  <  1 
0  elsewhere. 


(3.11) 


If  A  =  1,  W  is  a  rectangular  window.  For  0  <  A  <  1,  the  weights  decay  at  an  exponen¬ 
tial  rate,  gradually  decreasing  the  effect  of  old  data  on  current  parameter  estimates, 
which  is  why  A  is  called  the  forgetting  factor  [22],  The  resulting  rectangular  and  ex¬ 
ponential  sliding  window  approaches  using  STLS  are  analogous  to  the  sliding  window 
least  squares  (SWLS)  and  exponentially  weighted  least  squares  (EWLS)  approaches 
compared  by  Niedzwiecki  [47].  For  estimators  with  the  same  effective  window  length, 
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EWLS  has  better  parameter  tracking  characteristics  due  to  the  window’s  higher  de¬ 
gree  of  concentration  at  the  leading  edge  of  the  window,  while  the  rectangular  SWLS 
has  better  parameter  matching  properties  due  to  the  linearity  of  its  phase  charac¬ 
teristic.  Essentially,  reducing  the  forgetting  factor  A  allows  the  algorithm  to  track 
fast  parameter  changes  better,  but  lowers  the  estimation  accuracy  attainable  when 
parameters  are  slowly-varying.  In  terms  of  AR  modeling,  the  exponential  window 
applies  an  artificial  damping  factor  to  the  data  in  order  to  improve  tracking  perfor¬ 
mance,  causing  the  corresponding  system  poles  to  shift:  to  Zk  ~  Zk/ A.  The  linear 
prediction  relation  in  Eq.  (3.5)  can  also  be  applied  in  the  backward  direction  with 
respect  to  time.  For  a  sinusoid  with  poles  on  the  unit  circle,  choosing  A/  >  1  in  the 
forward  direction  scales  the  system  poles  within  the  unit  circle  and  is  the  same  as 
choosing  A^  =  1/A/  in  the  backward  direction. 


The  effective  sample  estimation  point  te  of  the  analysis  window  is  the  weighted 
time  average  of  the  window  VE[m]  for  which  a  linear  prediction  equation  is  valid,  i.e. 
p  +  1  <  m  <  M, 


f  Z*LP+1  rn\V[m] 

6  Em=P+i  W[m) 
mXM~m 

r*f  \  M—m 

Zsm=p+ 1  ^ 

The  corresponding  sample  estimation  delay  re  is 


(3.12) 


Te  =  M  - 


(3.13) 


and  the  effective  window  length  is  lejf  ~  2 re.  Taking  advantage  of  knowing  the  point 
in  time  iej  for  which  an  estimate  fkj,  is  valid,  where 

tei  =  M  +  l  —  1  —  Te  —  te  +  l  —  1,  (3.14) 
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a  more  localized  estimate  of  the  amplitude  contours  in  Eq.  (3.9)  can  be  made  by 
contracting  X;  about  tej  and  reducing  the  number  of  rows  in  Z/.  The  weighted  average 
frequency  for  the  Zth  analysis  block, 


+  i]A^- 

\M—m 
£sm—p+ 1  7 


(3.15) 


where  fk[n]  for  1  <  n  <  N  is  the  underlying  frequency  contour  for  the  fcth  exponential, 
provides  a  measure  of  the  smoothing  effect  of  the  sliding  window.  However,  fkj  will 
usually  track  closer  to  /r[n]  than  j\.j  when  the  frequency  contour  changes  faster  than 
linearly. 


3.2  Frequency  Tracking  of  Chirp  Signals 

This  section  presents  simulation  results  demonstrating  the  ability  to  track  the  fre¬ 
quency  of  harmonic  chirp  signals  in  the  presence  of  white  noise,  and  comparison  is 
made  with  other  frequency  estimation  methods.  The  simulated  chirp  whistles  are 
constructed  according  to  Eq.  (3.1)  and  Eq.  (3.3)  with  ar[n ]  =  1  for  all  n,  6r  =  0, 
N  =  500  samples,  fs  =  100  kHz,  v[n)  is  additive  white  gaussia.n  noise  with  variance 
c?l  such  that  SNR  =  5  dB  unless  specified  otherwise,  and  fr[n ]  is  specified  for  each 
chirp.  Unless  otherwise  specified,  the  algorithm  parameters  are  chosen  as  A  =  1, 
M  —  101,  and  p  =  2 /?,  with  the  chirp  having  R  harmonics.  In  the  following  figures, 
f htls  represents  the  positive  frequency  estimate  fk  of  fr  obtained  using  the  HTLS 
algorithm  and  j avg  is  the  weighted  average  frequency  for  each  analysis  block,  fk. 

3.2.1  Single  Harmonic  Linear  Chirp 

Fig.  3-1  demonstrates  the  frequency  estimation  and  tracking  performance  of  the  HTLS 
algorithm  for  a  linear  chirp  with  f\[n]  =  8000  +  2 (n  —  1)  (Hz)  for  1  <  n  <  N.  The 
resulting  frequency  estimate  is  essentially  unbiased,  which  can  be  seen  graphically 
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after  adjusting  for  the  estimation  delay,  where  re  =  49  samples  in  this  example. 


Figure  3-1:  HTLS  frequency  tracking  performance  for  a  linear  chirp  (SNR  =  5  dB) 


3.2.2  Double  Harmonic  Linear  Chirp 

Fig.  3-2  demonstrates  the  frequency  estimation  and  tracking  performance  of  the  HTLS 
algorithm  for  a  linear  chirp  with  two  harmonics  ( R  =  2),  fi[n]  =  8000  +  2(n  —  1)  (Hz) 
and  /2[n]  =  16000  +  4(n  —  1)  (Hz)  for  1  <  n  <  N. 
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Figure  3-2:  HTLS  frequency  tracking  performance  for  a  linear  chirp  with  two  har¬ 
monics  (SNR  =  5  dB) 


3.2.3  Single  Harmonic  Linear  Chirp  with  Abrupt  Frequency 
Shifts 

Fig.  3-3  demonstrates  the  frequency  estimation  and  tracking  performance  of  the  HTLS 
algorithm  for  a  linear  chirp  with  an  abrupt  frequency  shift  of  250  Hz, 

{8000  +  2.5(n  -  1)  for  1  <  n  <  250, 

(3.16) 

7750  +  2.5(n  -  1)  for  251  <  n  <  500. 
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Figure  3-3:  HTLS  frequency  tracking  performance  for  a  linear  chirp  with  abrupt 
frequency  shifts  (SNR  =  5  dB) 


Figure  3-4:  HTLS  frequency  tracking  performance  vs.  A  (SNR  =15  dB) 
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Fig.  3-4  shows  how  the  tracking  performance  of  the  HTLS  algorithm  is  improved 
by  lowering  the  forgetting  factor  A  at  the  expense  of  estimation  accuracy.  To  clearly 
demonstrate  the  tradeoff  between  tracking  performance  and  estimation  error,  an  SNR 
of  15  dB  and  a  frequency  shift  of  500  Hz  are  simulated,  where 

{8000  +  5 (n  -  1)  for  1  <  n  <  250. 

(3.17) 

7500  +  5(n  -  ] )  for  251  <  n  <  500. 

In  the  case  where  A  =  0.9,  the  transition  between  the  linear  chirp  segments  is  much 
sharper  than  for  A  =  1  due  to  the  shorter  effective  window  length.  The  corresponding 
estimation  point  te  is  closer  to  the  leading  edge  of  the  analysis  window,  which  shifts  the 
frequency  estimation  region  toward  the  end  of  the  signal.  The  increased  estimation 
error  variance  would  preclude  using  A  ^  1  for  most  frequency  estimation  problems, 
unless  it  was  necessary  to  detect  abrupt  frequency  shifts. 

3.2.4  Single  Harmonic  Linear  +  Sinusoidal  Chirp 

Fig.  3-5  demonstrates  the  frequency  estimation  and  tracking  performance  of  the  HTLS 
algorithm  for  a  chirp  with  a  combined  linear  and  sinusoidal  frequency  contour,  f\[n]  — 
8000  +  2(n  —  1)  +  500sin(-'”(~1^)  (Hz)  for  1  <  n  <  500.  The  frequency  estimation 
error  becomes  biased  at  peaks  in  the  underlying  frequency  contour,  fi[n],  due  to 
the  smoothing  effects  of  the  analysis  window.  However,  the  frequency  estimator 
tracks  closer  to  f\[n]  than  the  weighted  average  frequency  for  each  analysis  window. 
Thus,  while  peaks  in  the  actual  frequency  contour  are  not  fully  resolved  due  to  the 
estimation  bias,  the  existence  of  peaks  in  the  frequency  contour  can  be  detected  by  the 
HTLS  algorithm  with  a  sliding  window.  If  needed,  the  actual  peaks  could  be  resolved 
with  better  accuracy  by  removing  the  smoothing  effects  of  the  analysis  window  by 
deconvolution.  In  regions  where  the  frequency  contour  is  close  to  linear,  the  HTLS 
frequency  estimate  is  practically  unbiased. 
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Figure  3-5:  HTLS  frequency  tracking  performance  for  sinusoidal  chirp  (SNR  =  5  dB) 


3.2.5  Comparison  with  Alternative  Frequency  Estimators 


In  [43],  Marple  discusses  the  important  difference  between  spectral  estimation,  which 
attempts  to  match  the  spectrum  of  a  signal  over  a  continuous  range  of  frequencies, 
and  frequency  estimation,  which  is  only  concerned  with  the  behavior  of  the  spectrum 
local  to  a  specific  frequency.  Kay  [28]  reviews  the  sinusoidal  parameter  estimation 
problem,  showing  how  the  ML  estimate  of  the  frequency  of  a  single  complex  sinusoid  in 
complex  additive  white  Gaussian  noise  is  found  by  choosing  the  frequency  at  which  the 
periodogram  is  maximized.  The  Cramer-Rao  lower  bound  (CRLB)  for  the  unbiased 
frequency  estimator  of  a  single  complex  exponential  of  the  form 

s[n]  =  Ai  exp[j(2'Rj\n  +  ^h)]  +  v[n\,  for  1  <  n  <  N,  (3.18) 
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with  unknown  parameters  A\ ,  j\ ,  and  0\ ,  and  complex  white  Gaussian  noise  v[r>]  with 
variance  a2,  was  shown  by  Rife  and  Boorstyn  [57]  to  be 


var(/i)  > 


6cr2 


71?  (27r)2N(N2  —  1] 


For  a  single  real  sinusoid, 


(3.19) 


s[n]  =  A\  cos(27r/1n  +  6\)  +  v[n 


.4 


—  (  exp[j(27r/1n  +  R,)]  +  expf-j^Tr/jn  +  0i)]  )  +  v[n], 


(3.20) 


for  1  <  n  <  N,  the  frequency  CRLB  [58]  is 


var  (A)  > 


6<t2 


Aj  tt2n(n2  - 1; 


(3.21) 


When  estimating  the  unknown  parameters  of  a  single  complex  exponential  linear 
chirp  sequence,  the  CRLB  of  Fxj.  (3.19)  applies  to  the  center  frequency  of  the  analysis 
window  [16].  Extending  to  real  linear  sinusoidal  chirp  signals,  the  CRLB  of  Eq.  (3.21) 
also  applies  to  the  center  frequency  of  the  analysis  window  [58]. 

Quinn  and  Hannan  [55]  present  different  classes  of  frequency  estimators  that  can 
be  compared  with  the  HTLS  algorithm  for  linear  chirp  signals.  Fig.  3-6  shows  the 
performance  of  some  of  these  frequency  estimators  compared  to  the  CRLB  for  the 
linear  chirp  in  Eq.  (3.1),  with  R  =  1,  ajn]  =  A\  =  1  and  fi[n\  =  8000  +  (n  —  1)  (Hz) 
for  1  <  n  <  N,  N  =  1100,  fs  =  100  kHz,  and  0\  =  0.  The  HTLS  frequency  estimate 
was  computed  using  a  rectangular  window  (A  =  1)  of  length  M  =  101  and  a  model 
order  of  p  =  2.  SNR  is  defined  as 


SNR  = 


2  *1 


(3.22) 
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The  MSE  for  each  frequency  estimator  is  computed  as 


MSE  =  — 

J  Lj 


(3.23) 


where  J  =  5  is  the  number  of  independent,  trials  performed  for  each  chirp,  L  = 
1000  is  the  number  of  frequency  estimates  computed  for  each  trial,  and  ft  is  the 
center  frequency  of  the  Zth  analysis  window  for  a  rectangular  window.  Each  of  the 
frequency  estimators  applies  the  same  sliding  rectangular  window  to  the  data  to 
obtain  a  frequency  estimate  fij  for  each  analysis  window  and  trial.  The  corresponding 
CRLB  is 

-  S N R  it2 M (M2  -  1) 

The  FTI  frequency  estimator,  using  the  FT  I  3  algorithm  of  [55],  performs  an  in¬ 
terpolation  about  the  maximiser  of  the  periodogram  using  three  Fourier  coefficients. 


Figure  3-6:  Linear  chirp  frequency  estimator  performance  vs.  CRLB 
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Macleod  [40]  has  developed  alternative  techniques  based  on  the  same  approach.  The 
RIFE  frequency  estimator  is  an  older  approach  by  Rife  and  Vincent  [59]  based  on 
quadratic  interpolation  of  the  moduli  of  Fourier  coefficients  to  reduce  data  storage 
requirements.  The  QUINN  frequency  estimator  is  an  AR-based  iterative  algorithm 
developed  by  Quinn  and  Fernandes  [54].  The  multiple  signal  characterization  (MU¬ 
SIC)  frequency  estimator  developed  by  Schmidt  [63]  is  based  on  eigenanalysis  of  the 
noise  subspace. 

Each  of  the  frequency  estimators  in  Fig.  3-6  was  developed  for  quasi-stationary 
signals  for  which  the  frequency  could  be  considered  constant  in  each  analysis  window. 
Even  though  the  estimators  are  used  in  an  unconventional  manner  when  analyzing 
linear  chirps,  they  provide  a  baseline  to  gauge  the  performance  of  the  HTLS  algorithm. 
As  the  SNR  increases  above  10  dB.  the  HTLS  algorithm  increasingly  outperforms  the 
other  frequency  estimators  and  nearly  achieves  the  CRLB  for  an  unbiased  estimator. 
Between  0  and  5  dB,  the  QLIINN  frequency  estimator  outperforms  the  HTLS  and 
FTI  estimators  due  to  an  inherent  bias  that  worsens  performance  at  higher  SNR.. 
The  faster  FTI  frequency  estimator  achieves  nearly  the  same  performance  and  can 
be  considered  as  an  alternative  to  HTLS  at  lower  SNR. 

A  lot  of  research  has  been  done  on  joint  ML  frequency  and  chirp  rate  estimation 
of  linear  chirp  signals  with  short  data  lengths.  Djuric  and  Kay  [16]  proposed  similar 
estimators  based  on  their  ease  of  on-line  or  off-line  implementation  that  achieve  the 
CRLB  at  SNR  above  8  dB,  with  SNR  defined  as  (-£)  for  a  single  complex  sinusoid. 
Liang  and  Arun  [37]  use  a  method  very  similar  to  the  HTLS  algorithm  with  balanced 
splitting  to  initialize  a  search  for  the  ML  parameter  estimates  of  multiple  superim¬ 
posed  chirp  signals,  with  simulation  results  attaining  the  CRLB  at  SNRs  above  10 
dB.  Saha  and  Kay  [61]  propose  using  importance  sampling  to  maximize  a  compressed 
likelihood  based  on  frequency  and  chirp  rate  to  implement  joint  ML  parameter  esti¬ 
mation  of  superimposed  chirp  signals,  demonstrating  simulation  results  that  achieve 
the  CRLB  at  SNRs  above  3  dB.  At  low  enough  SNR,  all  of  the  frequency  estimators 


61 


depart  sharply  from  the  CRLB,  as  seen  in  Fig.  3-6  below  an  SNR  of  3  dB.  Ultimately, 
Fig.  3-6  demonstrates  that  the  HTLS  algorithm  can  be  used  to  nearly  optimally  track 
the  frequencies  of  chirped  signals. 


3.3  Amplitude  Estimation  of  Chirp  Signals 


This  section  presents  simulation  results  demonstrating  the  ability  of  the  Prony  method 
to  estimate  the  amplitudes  of  a  double  harmonic  linear  chirp  signal  based  on  frequency 
estimates  obtained  using  the  HTLS  algorithm.  As  with  Section  3.2,  the  simulated 
chirp  whistle  is  constructed  according  to  Eq.  (3.1)  and  Eq.  (3.3)  with  N  =  500 
samples,  fs  —  100  kHz,  9r  —  0,  and  v[n]  is  white  Gaussian  noise  with  variance  o\. 
The  frequency  and  amplitude  contours  are  defined  as 


fr[n]  = 


8000  +  2(n  -  1 ; 


r  =  1 


16000  +  4(n  -  1),  r  =  2 


(3.25) 


and 


|(1  +  tukey[n]),  r=l 


ar[n\ 


(3.26) 


f(l  +  tukey[n]),  r  =  2 

for  1  <  n  <,  where  tukey[n]  is  the  N  point,  cosine-tapered  Tukey  window  [20]  with 
parameter  a  =  0.5  shown  in  Fig.  3-7.  The  HTLS  algorithm  parameters  are  chosen 
as  A  =  1,  M  =  101,  and  p  =  4.  The  harmonic  chirp  amplitude  estimates  are  found 
from  a  reduced  version  of  Eq.  (3.9)  by  using  W  =  20  data  points  centered  at  the 
estimation  point  te  /  for  the  /th  analysis  window, 


R  =  {Z?  ZO-'Zfx, 


(3.27) 
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Figure  3-7:  Tukey  window  with  a  =  0.5 


where 


1  1 

1 

hi,; 

xl[[t-e,l  ~  Y  +  lj] 

Z  ,= 

Z\  ,Z  z2,l 

ZP,l 

,  h,  = 

h‘2 ,/ 

,  and  X;  = 

xl[[te,l  -  \  +  2J] 

~W—1  ~W- 1 

_Z1 J  ~'2,l 

1 

7 

hp,; 

xi[[ki-T  +  WW. 

This  is  done  to  limit  the  effect  of  time-varying  frequency  and  amplitude  parameters 
within  the  analysis  window  while  providing  sufficient  averaging  to  reduce  the  error 
variance. 

Fig.  3-8  compares  the  estimated  amplitude  contours  ais  to  the  actual  contours 
in  Eq.  (3.26)  for  an  SNR  of  50  dB.  There  are  two  noticeable  factors  which  increase 
the  amplitude  estimation  error  at  relatively  high  SNRs.  First,  even  in  regions  of  con¬ 
stant  harmonic  amplitudes,  the  second  harmonic  amplitude  estimate  shows  greater 
deviation  from  the  known  contour.  Rife  and  Boorstyn  [58]  show  that  for  multiple 
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Figure  3-8:  Amplitude  estimation  performance  for  double  harmonic  linear  chirp  (SNR 
=  50  dB) 


tones,  the  CRLB  of  a  particular  tone’s  frequency  estimate  depends  on  its  own  ampli¬ 
tude  but  is  independent  of  the  other  tone  amplitudes.  The  weaker  second  harmonic 
results  in  a  less  accurate  amplitude  estimate  due  to  a  less  accurate  frequency  esti¬ 
mate  in  Eq.  (3.27).  Second,  in  regions  where  a  tone’s  amplitude  is  time-varying,  the 
amplitude  estimate  is  less  accurate  because  Eq.  (3.27)  assumes  the  parameters  h^i 
are  constant  within  the  analysis  window.  The  largest  estimation  error  in  Fig.  3-8 
occurs  in  regions  where  both  the  chirp  amplitude  is  changing  and  the  corresponding 
frequency  estimate  is  less  accurate.  A  third  source  of  error  is  due  to  the  assumption 
that  the  frequencies  are  also  constant  in  Eq.  (3.27),  while  the  underlying  frequency 
contours  are  also  time- varying.  Fig.  3-9  shows  the  residual  MSE  in  the  amplitude 
estimation  problem,  computed  from  Eq.  (3.27)  as 

residual  MSE  =  — — -y— .  (3.28) 


64 


Figure  3-9:  Residual  MSE  for  double  harmonic  linear  chirp  (SNR  =  50  dB) 


The  residual  MSE  is  characterized  as  being  somewhat  periodic  and  sensitive  to  rapid 
changes  in  the  amplitude  and  frequency  contours,  with  strong  dependence  on  the 
weaker  chirp  amplitudes  due  the  corresponding  decrease  in  frequency  estimation  ac¬ 
curacy. 

Fig.  3-10  compares  the  estimated  amplitude  contours  a^s  to  the  actual  contours  in 
Eq.  (3.26)  for  SNR  =  25  dB.  The  increased  additive  white  noise  degrades  the  frequency 
and  amplitude  estimation  problems,  resulting  in  larger  deviations  from  the  underlying 
amplitude  contour  for  sustained  periods  of  time.  Fig.  3-11  shows  the  corresponding 
residual  MSE.  In  comparison  with  Fig.  3-9,  the  increased  additive  white  noise  boosts 
the  residual  MSE  while  reducing  the  relative  performance  gain  when  the  amplitudes 
are  held  constant. 
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Chapter  4 


Synthetic  Marine  Mammal  Whistle 
Calls 


This  chapter  applies  the  experience  gained  from  the  parameter  estimation  of  harmonic 
linear  chirps  in  Chapter  3  to  the  parameter  estimation,  modification  and  synthesis 
of  bottlenose  dolphin  whistle  calls.  Section  4.1  focuses  on  parameter  estimation  and 
synthesis  of  bottlenose  dolphin  whistle  calls.  Section  4.2  proposes  different  strategies 
for  watermarking  whistle  calls  based  upon  detection  capability  and  exploiting  natural 
variability  in  the  whistle  call  frequency  contours. 

4.1  Modeling  Recorded  Bottlenose  Dolphin  Whis¬ 
tle  Calls 

Fig.  4-1  shows  a  bottlenose  dolphin  whistle  call  composed  of  three  separate  whistles 
taken  from  the  Sarasota  Bottlenose  Dolphin  Whistle  Catalog  maintained  at  Woods 
Hole  Oceanographic  Institution  [62].  The  whistle  call  was  recorded  using  a  custom 
built  suction  cup  hydrophone  attached  to  the  forehead  of  the  dolphin.  The  original 
analog  recording  at  fs  =  40  kHz  was  later  digitized  using  a  sample  rate  of  fs  =  96 
kHz.  Fig.  4-2  is  a  spectrogram  of  the  bottlenose  dolphin  whistle  call  in  Fig.  4-1 
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Whistle  1 


Whistle  2 


Whistle  3 


0  6 

Time  (sec) 


Figure  4-1:  Bottlenose  dolphin  whistle  call 


Frequency  (Hz) 


Figure  4-2:  Spectrogram  of  bottlenose  dolphin  whistle  call  in  Fig.  4-1  (dB) 


computed  using  the  short-time  Fourier  transform  with  a  750  point.  Hamming  window 
and  250  samples  of  overlap  [49].  Each  whistle  contains  up  to  six  harmonics  with 
frequency  generally  increasing  throughout  the  whistle. 

The  performance  of  the  frequency  estimation  problem  is  dependent  upon  three 
parameters:  the  exponential  forgetting  factor  A,  the  number  of  data  samples  M  used 
in  each  analysis  block,  and  the  model  order  p.  To  limit  the  smoothing  effect  of 
the  analysis  window  while  achieving  optimal  frequency  matching  characteristics,  the 
values  A  =  1  and  M  =  101  are  chosen.  The  choice  of  p  is  more  complex.  If  the  whistle 
calls  were  composed  of  R  harmonics  with  stable,  relatively  equal  amplitude  contours, 
then  the  model  order  would  be  chosen  as  p  =  2 R.  In  reality,  the  higher  harmonics  are 
significantly  weaker  than  the  fundamental  harmonic,  and  in  regions  where  the  whistle 
amplitude  or  frequency  changes  rapidly,  the  amplitudes  of  each  harmonic  fluctuate 
strongly.  Due  to  the  known  harmonic  structure  of  the  whistles  and  the  relatively 
weak  amplitudes  of  higher  harmonics,  all  harmonics  are  best  estimated  as  multiples 
of  the  fundamental  harmonic,  f\.  A  low  model  order  of  p  =  2  is  chosen,  for  which  the 
frequency  of  the  strongest  harmonic  is  estimated,  because  of  the  occasional  instability 
of  the  whistle  harmonics.  However,  since  higher  harmonics  are  not  accounted  for  in 
the  model,  the  resulting  fundamental  frequency  estimate  has  a  higher  error  variance 
than  if  the  data  contained  only  the  fundamental  frequency  contour.  The  solution 
is  to  apply  a  bandpass  filter  to  isolate  the  fundamental  harmonic  from  the  higher 
harmonics  before  performing  frequency  estimation. 

The  wide  frequency  range  of  the  bottlenose  dolphin  whistle  calls  require  using 
two  overlapping  bandpass  filters  to  isolate  the  fundamental  frequency  contour.  The 
overlap  region  is  chosen  to  be  large  enough  to  allow  a  smooth  transition  between  the 
two  frequency  estimates.  The  Matlab  command  f  iltf  ilt  [45]  is  used  to  perform  zero¬ 
phasing  filtering  to  ensure  the  resulting  estimated  frequency  contours  are  correctly 
aligned  in  the  time  domain.  Fig.  4-3  shows  the  frequency  estimates  for  Whistle  1 
obtained  using  a  bandpass  filter  overlap  region  of  12-12.5  kHz  and  a  transition  time 
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between  frequency  estimates  of  237.65  msec.  The  resulting  fundamental  frequency 
contour  is  shown  in  Fig.  4-4.  The  frequency  contours  of  the  higher  harmonics  are 
fr[l\  =  rf]  [7]  for  1  <  l  <  L,  where  L  is  the  number  of  analysis  blocks  in  the  whistle. 

The  harmonic  amplitude  estimates  are  then  found  for  each  analysis  block  using 
an  estimation  width  of  W  =  20  data  points.  For  each  data  block,  the  number  of 
harmonic  amplitudes  to  be  estimated  is  specified  based  on  the  frequency  of  the  fun¬ 
damental  harmonic.  For  example,  when  the  fundamental  harmonic  exceeds  10  kHz, 
there  will  be  at  most  three  harmonics  present  due  to  the  frequency  cutoff  at  40  kHz. 
Overestimating  the  number  of  harmonics  in  the  data  gives  spurious  results.  The  esti¬ 
mated  amplitude  contours  for  Whistle  1  is  shown  in  Fig.  4-5.  It  is  important  to  keep 
in  mind  that  the  amplitude  estimates  are  performed  for  the  recorded  whistle  and 
are  not  necessarily  representative  of  the  actual  whistle,  since  the  higher  harmonics 
are  artificially  cutoff  by  the  recording  equipment  at  frequencies  greater  than  40  kHz. 
The  actual  harmonic  amplitudes  most  likely  do  not  fluctuate  as  rapidly  as  seen  in 


Figure  4-5:  Estimated  amplitude  contours  for  Whistle  1  in  Fig.  4-1 
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Fig.  4-5.  The  observed  short-time  variability  in  the  amplitude  contours  accounts  for 
model  mismatch  and  frequency  estimation  error. 

Fig.  4-6  shows  the  residual  MSE  for  Whistle  1.  The  MSE  is  remarkably  low  in  the 
middle  of  the  whistle  while  the  amplitude  contours  are  relatively  stable,  indicating 
good  frequency  and  amplitude  estimation  performance.  In  regions  where  the  whis¬ 
tle  is  less  stable,  such  as  during  the  attack  phase  at  the  beginning  of  the  whistle, 
the  parameters  vary  more  quickly,  resulting  in  worse  estimation  performance.  The 
synthetic  whistle  is  then  constructed  from  the  harmonic  frequency  and  amplitude 
contours  according  to  the  model  in  Eq.  (3.1), 


s[l]  =  Y  ar[l]  cos  (2tt  Y 


r=  1 


rf\  [i 
f. 


+  Or 


for  1  <  l  <  L. 


(4.1) 


where  ar  are  the  harmonic  amplitude  contours,  9r  are  the  initial  phases  of  each  har¬ 
monic.  and  f\  is  the  fundamental  frequency  contour.  Since  the  human  auditory  system 


Tim©  (msec) 

Figure  4-6:  Residual  MSE  for  Whistle  1  in  Fig.  4-1 
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(HAS)  is  insensitive  to  the  initial  phase,  the  synthetic  whistles  could  be  constructed 
with  0T  —  0.  but  accounting  for  the  initial  phase  difference  between  harmonics  causes 
the  synthetic  whistle  to  more  closely  resemble  the  recorded  whistle  in  the  time  do¬ 
main. 

Fig.  4-7  compares  the  recorded  and  synthetic  time  domain  representations  for 
Whistle  1.  Fig.  4-8  compares  the  spectrograms  for  the  recorded  and  synthetic  versions 
of  Whistle  1.  In-air  playbacks  using  Matlab  demonstrate  that  the  synthetic  whistle 
is  almost  indistinguishable  from  the  recorded  whistle.  However,  the  sinusoidal  model 
does  not  account  for  any  stochastic  ‘noise-like’  portions  of  the  whistle,  such  as  seen 
surrounding  the  fundamental  frequency  contour  at  the  end  of  Whistle  1  in  Fig.  4-8. 
Other  dolphin  whistles  should  be  studied  to  determine  whether  this  type  of  stochastic 
effect  is  actually  produced  by  the  dolphin. 

Figs.  4-9  through  4-12  show  the  fundamental  frequency  and  amplitude  contours 
for  Whistles  2  and  3  in  Fig.  4-1.  Each  successive  whistle  has  a  longer  duration  and  is 
characterized  by  increasingly  stable  frequency  and  amplitude  contours.  The  residual 
MSE  for  Whistles  2  and  3  is  shown  in  Fig.  4-13  and  Fig.  4-14.  Both  Whistle  2  and  3 
have  a  lower  residual  MSE  than  Whistle  1.  as  expected  based  on  the  stability  of  the 
frequency  and  amplitude  contours.  Each  whistle  has  a  higher  residual  MSE  when  the 
fundamental  frequency  is  rapidly  increasing  toward  the  end  of  the  whistle. 
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Figure  4-7:  Recorded  (top)  vs.  synthetic  (bottom)  versions  of  Whistle  1  in  Fig.  4-1 


Frequency  (Hz)x  10  Frequency  (Hz)x  10 


Figure  4-8:  Spectrograms  of  recorded  (left)  vs.  synthetic  (right)  versions  of  Whistle  1 
in  Fig.  4-1  (dB) 
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Figure  4-11:  Fundamental  frequency  contour  of  Whistle  3  in  Fig.  4-1 
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Figure  4-12:  Estimated  amplitude  contours  for  Whistle  3  in  Fig.  4-1 


4.2  Watermarked  Synthetic  Whistle  Calls 


In  a  covert  communications  scenario,  a  blind  watermark  detection  scheme  is  generally 
desired,  in  which  the  host  signal  is  not  needed  for  watermark  retrieval.  Due  to  the 
sensitivity  of  the  HAS,  a  parametric  watermark  that  produces  a  natural-sounding 
stego-signal  provides  the  best  opportunity  for  passing  embedded  information  with¬ 
out  alerting  observers  to  the  existence  of  the  information.  The  harmonic  frequency 
contours  are  chosen  as  the  parameter  set  to  be  watermarked  based  on  the  strong 
performance  of  the  sinusoidal  model  of  Eq.  (4.1)  in  representing  recorded  bottlenose 
dolphin  whistle  calls.  In  order  to  produce  natural-sounding  whistles  using  a  retriev¬ 
able  watermark,  the  harmonic  relationship  between  frequency  contours  should  be 
maintained.  Thus,  different  schemes  for  watermarking  the  fundamental  frequency 
contour  of  a  synthetic  whistle  should  be  considered  in  terms  of  the  ease  of  watermark 
detection  and  retrieval  and  the  naturalness  of  the  resulting  stego-signal. 

The  fundamental  frequency  contour  regularly  fluctuates  about  its  instantaneous 
mean  that  can  be  described  as  a  vibrato  in  the  frequency  contour.  Instead  of  adding 
distortion  on  top  of  the  observed  vibrato,  watermark  retrieval  can  be  enhanced  by 
watermarking  the  instantaneous  mean  frequency  (IMF)  contour,  f IMf ,  which  is  as¬ 
sumed  to  be  the  original  frequency  contour  if  the  vibrato  effect  did  not  occur.  The 
vibrato  can  be  thought  of  as  a  stochastic  vibration  or  watermark  fw  added  to  the 
smoothed  frequency  contour  Jimf,  so  that 

fi[l]  =  /n/Fp]  +  fw[l]i  for  1  <  l  <  L.  (4.2) 

Since  the  natural  bottlenose  dolphin  whistles  consist  of  distorted  frequency  contours, 
there  is  a  good  chance  that  robust  watermarking  methods  can  be  utilized  to  produce 
natural-sounding  synthetic  whistles.  However,  if  the  watermark  is  too  natural,  it  may 
be  difficult  to  distinguish  between  natural  and  synthetic  whistles. 

The  IMF  contour  is  found  as  the  weighted  time-average  of  the  fundamental  fre- 
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Figure  4-15:  Impulse  response  of  moving-average  filter 


quency  contour  using  a  moving- average  filter  with  the  impulse  response  shown  in 
Fig.  4-15.  The  observed  bottlenose  dolphin  whistle  vibrato  occurs  with  an  average 
period  of  roughly  1  msec,  so  the  effective  impulse  response  length  of  the  filter  is  cho¬ 
sen  to  be  about  1  msec.  The  resulting  moving- average  filter  gives  equal  weight  to 
local  frequency  estimates  while  giving  consideration  to  more  distant  values  in  order 
to  smoothly  estimate  the  IMF.  The  Matlab  command  filtfilt  [45]  is  again  used 
to  perform  zero-phase  filtering.  The  fundamental  frequency  and  IMF  contours  for  a 
portion  of  Whistle  2  are  shown  in  Fig.  4-16. 

In  a  covert  communications  scenario,  it  would  be  desirable  to  be  able  to  retrieve  the 
watermark  under  relatively  low  SNR  conditions,  such  as  SNR  =  5  dB.  This  requires 
a  relatively  robust  watermarking  scheme  that  facilitates  watermark  retrieval  even 
when  frequency  estimation  performance  is  relatively  bad.  Liu’s  F-QIM  watermarking 
scheme  [38],  which  is  based  on  detecting  the  difference  between  separate  frequency 
quantizers,  would  require  either  large  frequency  deviations  between  quantizer  levels 
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Figure  4-16:  Fundamental  frequency  and  IMF  contours  for  a  portion  of  Whistle  2  in 
Fig.  4-1 

or  high  SNR  to  ensure  robust  watermark  retrieval  due  to  the  frequency  estimation 
performance.  The  remainder  of  this  chapter  considers  two  watermarking  schemes  that 
are  relatively  robust  for  a  range  of  SNR.  The  first,  scheme  constructs  a  watermark 
composed  of  linear  chirp  segments  separated  by  an  abrupt  frequency  shift.  The  second 
scheme  constructs  a  watermark  that  simulates  the  natural  vibrato  of  the  fundamental 
frequency  using  continuous-phase  modulation  (CPM). 

4.2.1  Linear  Chirp  Segments  With  Abrupt  Frequency  Shifts 

The  goal  of  most  communications  systems  is  to  maximize  the  achievable  data  rate 
for  which  transmitted  information  can  be  reliably  decoded.  This  implies  that  each 
information  bit  will  correspond  to  a  minimal  number  of  samples  in  the  transmitted 
signal.  Thus,  from  the  perspective  of  data  rate,  an  optimal  frequency  watermarking 
scheme  will  have  a  relatively  low  number  of  samples  per  information  bit  available  for 
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frequency  estimation.  Increasing  the  sample  rate  at  the  receiver  will  also  generally 
improve  frequency  estimation  performance  by  providing  more  samples  per  information 
bit,  but  it  is  assumed  fixed  when  choosing  a  watermarking  scheme.  At  low  SNR, 
small  changes  in  the  frequency  contour  may  be  obscured  by  the  increased  frequency 
estimation  variance,  making  robust  QIM-based  watermarking  schemes  unattractive 
in  terms  of  perceptual  distortion  of  the  host  signal.  To  improve  frequency  estimation 
performance  and  limit  perceptual  distortion,  the  IMF  contour  should  be  watermarked 
with  a  generally  smoothly-varying  signal  that  can  be  tracked  over  time  using  the 
HTLS  frequency  estimator  or  other  frequency  estimators. 


Figure  4-17:  Watermarking  scheme  based  on  linear  chirp  segments  with  abrupt  fre¬ 
quency  shifts 

A  potential  watermarking  scheme,  portrayed  in  Fig.  4-17,  approximates  the  IMF 
contour  using  linear  chirp  segments  with  abrupt  frequency  shifts  A/.  The  water¬ 
marked  information  is  encoded  in  the  amount  of  time  between  abrupt  frequency 
shifts,  At,0  and  Af  i .  The  slope  of  each  linear  chirp  segment  is  chosen  to  achieve  a  fre- 
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quency  separation  of  A /  from  the  IMF  contour  after  a  duration  A t0  or  Atx  specified 
by  each  information  bit.  The  synthetic  stego-signal  is  then  constructed  according  to 
Eq.  (4.1)  using  the  watermarked  fundamental  frequency  contour  and  the  amplitude 
contours  estimated  using  the  original  fundamental  frequency  contour  estimate.  An 
alternative  to  the  watermarking  scheme  in  Fig.  4-17  is  to  tag  the  midpoint  instead  of 
the  initial  point  of  each  linear  chirp  segment  to  the  IMF  contour. 

Fig.  4-18  shows  the  linear  chirp  watermarked  fundamental  frequency  contour  based 
on  Whistle  2  of  Fig.  4-1.  The  watermarked  contour  was  constructed  using  a  random 
information  bit  stream  and  the  parameters  A /  =  150  Hz,  Ato  =  1  msec  and  A t\  =  2 
msec.  In-air  playbacks  using  Matlab  demonstrate  that  there  is  a  small  perceptible 
difference  between  the  recorded  and  watermarked  synthetic  whistles.  The  parameter 
that  most  effects  the  perceptible  distortion  of  the  host  signal  is  the  frequency  shift, 
A/.  At  relatively  high  SNR,  the  frequency  estimation  performance  will  be  improved, 
and  thus  require  a  smaller  A /  for  reliable  watermark  retrieval.  As  SNR  decreases, 


Figure  4-18:  Linear  chirp  watermarked  frequency  contour  of  Whistle  2  in  Fig.  4-1 
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the  frequency  estimation  variance  increases,  and  a  larger  A /  is  needed  to  differenti¬ 
ate  between  an  actual  frequency  shift  and  estimation  error.  Watermark  retrieval  is 
performed  by  detecting  abrupt  frequency  shifts  in  the  fundamental  frequency  contour 
of  the  received  whistle. 

4.2.2  Continuous  Phase  Modulation 

Due  to  the  inherent  vibrato  observed  in  the  bottlenose  dolphin  whistle  calls,  an  alter¬ 
native  to  the  linear  chirp  watermarking  scheme  is  to  embed  information  in  a.  synthetic 
vibrato  using  continuous  phase  modulation  (CPM)  as  shown  in  Fig.  4-19.  CPM  sig¬ 
nals  [52]  have  a  continuous  carrier  phase 

n 

d)(t ;  I)  =  2tt  Ikhkq{t  -  kT),  nT  <  t  <  (n  +  1  )T  (4.3) 

k=—o o 

where  {4}  is  a  sequence  of  M- ary  information  symbols  selected  from  the  alphabet 
±1,  ±3,  . . . ,  ±(M  —  1),  {/?*}  is  a  sequence  of  modulation  indices,  and  q(t)  is  some 
normalized  waveform  shape.  While  many  types  of  CPM  could  be  used  to  construct  a 
synthetic  whistle  vibrato,  a  simple  type  called  minimum-shift  keying  (MSI<)  can  be 
used  to  illustrate  a  watermarking  scheme  using  CPM. 
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Figure  4-19:  Watermarking  scheme  based  on  CPM  perturbation  of  the  IMF  contour 
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MSK  is  a  special  form  of  binary  CPM  in  which  the  modulation  index  h  =  \  and 


normalized  waveform  shape 


0  (t  <  0) 

q(t)  =  t/2T  (0  <  t  <  T)  ■ 

1/2  (or) 


The  phase  of  the  MSK  carrier  in  the  interval  nT  <t<  (n  +  1  )T  is 


<f)(t;  I)  =  —  7T  Ik  +  7ilnq(t  -  nT) 


1  /  £  tiT  \ 

T,  +  727lIn  (  7I'  j ,  nT  <t  <  ( n  +  1  )T, 


where 


@n  —  ^  Ik 


The  modulated  MSK  carrier  signal  with  amplitude  A  and  carrier  frequency  fc  is 


s(t)  =  A  cos  2tt/T  +  0n  +  -nln  (  — y—  J 

=  A  COS  27T  (fc  +  t  ~  \ n7;In  +  ® n  ,  r>T  <  t  <  (fl  +  1  )T . 


From  Eq.  (4.7),  it  can  be  seen  that  for  each  interval  nT  <t<(n+  1  )T,  MSK  can  be 
thought  of  as  having  one  of  two  frequencies, 

./o  J C  Arp 

f  (4.8) 

f  i  /c  T  5 

with  an  adjusted  phase  to  achieve  a  continuous  phase  across  all  intervals. 


The  synthetic  vibrato  signal  fw[l\  can  be  constructed  by  sampling  Eq.  (4.7)  at 
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the  points  t  =  l/fs  with  a  carrier  rate  of  fc  =  1/T, 

fw[l]  =  ^4  cos  ^  ^1  +  “  -  ^mr/n  +  0n  .  nT fs  <  l  <  (n  +  1  )Tfs,  (4.9) 

where  { I „  }  is  a  sequence  of  binary  information  symbols  ±1.  The  CPM  watermarked 
fundamental  frequency  contour  is 

fcPM [2]  =  flMF [^]  +  fw[l]-  1  <  l  <  L.  (4-10) 

Fig.  4-20  shows  both  the  unmodified  and  CPM-watermarked  fundamental  fre¬ 
quency  contours  for  a  portion  of  Whistle  2  in  Fig.  4-1,  where  fcPM  is  constructed 
using  the  parameters  A  =  50  Hz  and  T  =  1  msec.  The  main  distinguishing  feature 
between  the  two  fundamental  frequency  contours  is  that  the  watermarked  contour 
vibrato  has  a  constant  amplitude  as  opposed  to  the  variable  strength  vibrato  in  the 


Figure  4-20:  Unmodified  and  CPM-watermarked  frequency  contours  for  a  portion  of 
Whistle  2  in  Fig.  4-1 


recorded  whistle.  In-air  playbacks  using  Matlab  demonstrate  that  the  watermarked 
whistle,  constructed  from  Eq.  (4.1)  using  the  unmodified  amplitude  contour  estimates, 
is  essentially  imperceptible  from  the  recorded  whistle,  with  the  exception  of  slight 
background  noise  in  the  recorded  whistle.  Proakis  [52]  covers  CPM  demodulation 
methods  that  can  be  used  for  watermark  retrieval  after  estimating  the  fundamental 
frequency  contour  of  the  received  whistle. 
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Chapter  5 


Experimental  Results 


This  chapter  presents  results  from  the  Rescheduled  Acoustic  Communications  Ex¬ 
periment  (RACE08)  conducted  in  Narragansett  Bay  during  March  2008.  Synthetic 
whistle  calls  based  on  the  bottlenose  dolphin  whistle  call  in  Fig.  4-1  were  transmit¬ 
ted  throughout  the  experiment.  The  frequency  estimation  performance  of  the  HTLS 
algorithm  is  demonstrated  for  both  natural  and  watermarked  frequency  contours. 


5.1  RACE08  Description 

RACE08  was  conducted  at  the  University  of  Rhode  Island’s  Narragansett  Bay  Cam¬ 
pus,  shown  in  Fig.  5-1,  from  March  1st  through  March  25th.  Acoustic  signals  were 
transmitted  from  a  stationary  tripod  located  roughly  100  meters  from  shore  in  water 
depth  of  9  meters.  The  primary  source  transducer,  an  ITC-1007  spherical  transducer 
with  resonant  frequency  of  approximately  11kHz,  was  located  about  4  meters  from 
the  sea  floor.  A  source  array  composed  of  three  Datasonics  AT-12ET  transducers,  lo¬ 
cated  beneath  the  ITC-1007,  was  not  used  for  transmitting  synthetic  whistles.  Three 
main  receiver  array  tripods  were  located  roughly  400  meters  to  the  East,  400  meters 
to  the  North,  and  1000  meters  to  the  North  of  the  source  array  tripod.  The  400  meter 
receiver  arrays  were  composed  of  24  elements  with  5  cm  spacing.  The  1000  meter 
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Figure  5-1:  University  of  Rhode  Island's  Narragansett  Bay  Campus 


receiver  array  was  composed  of  12  elements  with  12  cm  spacing.  The  bottom  element 
of  each  receiver  array  was  located  2  meters  above  the  sea  floor.  The  water  depths 
between  source  and  receiver  arrays  ranged  from  9  to  14  meters.  A  reference  ITC-100 
hydrophone  was  mounted  1  meter  from  the  ITC-1007  source  transducer.  The  sample 
rate  of  the  transmitter  and  all  receivers  was  39062.5  Hz  ( le7 / 256) . 


5.2  RACE08  Results 

Synthetic  whistle  calls,  based  on  the  bottlenose  dolphin  whistle  call  in  Fig.  4-1,  were 
transmitted  on  the  ITC-1007  source  transducer  at  two  hour  intervals  throughout 
the  RACE08  experiment.  The  results  presented  here,  taken  from  the  8:00  P.M.  EDT 
transmission  on  March  23rd,  were  chosen  for  relatively  calm  environmental  conditions 
in  Narragansett  Bay. 

Fig.  5-2  compares  spectrograms  of  unmodified  synthetic  whistle  calls  received  at 
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Figure  5-2:  Spectrograms  of  unmodified  synthetic  whistle  calls  received  at  the  refer¬ 
ence  (left)  and  N1000  (right)  hydrophones  (dB) 


the  reference  and  North  1000  meter  (N1000)  hydrophones.  The  reference  hydrophone 
records  the  whistle  call  without  multipath  or  intersymbol  interference  (ISI).  while  the 
N1000  hydrophone  sees  an  impulse  response  of  length  greater  than  0.5  seconds.  The 
relatively  long  impulse  response  is  due  to  strong  reflections  from  shore  in  the  narrow 
channel. 

Fig.  5-3  compares  spectrograms  of  watermarked  synthetic  whistle  calls  received 
at  the  reference  and  NT  000  hydrophones.  The  watermarking  scheme  was  similar  to 
that  portrayed  in  Fig.  4-17.  except  that  the  frequency  was  held  constant  for  each 
information  bit,  resulting  in  a  variable  abrupt  frequency  shift  A/.  The  parameters 
Afn  =  10.2  msec  and  A t\  =  20.4  msec  were  chosen  for  initial  testing  to  ensure 
frequency  estimation  and  watermark  retrieval  could  be  demonstrated.  The  presence  of 
the  watermark  is  clearly  seen  at  the  N1000  hydrophone  in  Fig.  5-3,  since  the  multipath 
energy  only  appears  at  discrete  frequencies  determined  by  the  watermarking  scheme. 


89 


Frequency  (Hz)x  10 


Frequency  (Hz)x  10 


Figure  5-3:  Spectrograms  of  watermarked  synthetic  whistle  calls  received  at  the  ref¬ 
erence  (left)  and  N1000  (right)  hydrophones  (dB) 
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Figure  5-4:  Reference  hydrophone  recording  of  unmodified  Whistle  3  in  Fig.  5-2 


Fig.  5-4  shows  the  third  whistle  (Whistle  3)  from  Fig.  5-2  as  recorded  by  the 
reference  hydrophone.  Due  to  the  frequency  response  of  the  ITC-1007  transducer, 
the  amplitude  of  Whistle  3  varies  in  time  as  the  frequency  changes.  The  rest  of  this 
chapter  examines  the  frequency  estimation  performance  of  the  HTLS  algorithm  for 
both  unmodified  and  watermarked  versions  of  Whistle  3. 

Fig.  5-5  compares  the  frequency  estimation  performance  for  both  unmodified  and 
watermarked  versions  of  Whistle  3  received  by  the  reference  hydrophone,  using  the 
parameters  A  =  1,  M  =  101,  and  p  =  2.  A  major  drawback  of  this  watermarking 
scheme  is  that  when  the  unmodified  frequency  contour  is  relatively  constant,  there  is 
little  frequency  separation  between  information  bits,  and  watermark  retrieval  requires 
excellent  frequency  estimation.  By  using  linear  chirp  segments  with  abrupt  frequency 
shifts  A/,  robust  watermark  retrieval  is  possible  independent  of  the  unmodified  fre¬ 
quency  contour. 


Figure  5-5:  Frequency  estimation  performance  for  unmodified  (left)  and  watermarked 
(right)  whistle  contours  received  at  reference  hydrophone 
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Fig.  5-6  compares  the  frequency  estimation  performance  for  both  unmodified  and 
watermarked  versions  of  Whistle  3  received  by  the  N1000  hydrophone,  using  the 
parameters  A  =  1,  M  =  101,  and  p  =  2 R  with  up  to  3  harmonics.  The  effect  of  ISI  is 
combatted  by  increasing  the  model  order  to  account  for  major  peaks  in  the  impulse 
response,  yielding  good  frequency  estimation  of  the  transmitted  contour.  However, 
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Figure  5-6:  Frequency  estimation  performance  for  unmodified  (left)  and  watermarked 
(right)  whistle  contours  received  at  N1000  hydrophone 


overestimating  the  model  order  harms  the  frequency  estimation  performance,  so  p 
was  manually  adjusted  to  account  for  the  onset  of  strong  multipath  arrivals. 

Fig.  5-7  and  Fig.  5-8  show  the  complete  estimated  frequency  contours  for  unmod¬ 
ified  and  watermarked  versions  of  Whistle  3  received  by  the  N1000  hydrophone.  As 
seen  in  Fig.  5-7,  ISI  can  cause  sudden  spurious  frequency  estimation  results.  Dis¬ 
counting  the  outliers  in  Fig.  5-7,  the  standard  deviation  of  the  unmodified  whistle 
frequency  estimate  is  21.6  Hz,  while  the  standard  deviation  of  the  watermarked  fre¬ 
quency  estimate  is  20.8  Hz. 
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5-7:  Frequency  estimation  performance  for  unmodified  whistle  contour  re- 
at  N1000  hydrophone 


Figure  5-8:  Frequency  estimation  performance  for  watermarked  whistle  contour  re¬ 
ceived  at  N1000  hydrophone 


Although  the  distortion  due  to  ISI  presents  a  challenge  to  watermark  retrieval, 
it  can  be  overcome  in  mild  environmental  conditions  with  clearly  defined  multipath 
arrivals  by  appropriately  increasing  the  model  order  used  in  frequency  estimation. 
In  severe  environmental  conditions,  where  the  multipath  arrivals  reflected  off  surface 
waves  are  less  clearly  defined,  the  frequency  estimation  performance  will  degrade. 
Further  testing  with  the  watermarking  schemes  presented  in  Section  4.2  should  be 
performed  in  various  environmental  and  bathymetric  conditions  to  establish  the  op¬ 
erational  limits  on  robust  watermark  retrieval. 
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Chapter  6 


Conclusions  and  Future  Directions 


The  work  presented  in  this  thesis  develops  a.  method  for  high-resolution  modeling 
of  marine  mammal  whistle  calls  that  can  be  used  to  generate  natural  sounding  syn¬ 
thetic  whistles  for  biological  research  or  covert  communications.  Although  McAulay 
and  Quatieri  [46]  reported  good  results  in  applying  their  human  speech  processing 
sinusoidal  model  to  the  synthesis  of  whale  sounds,  their  technique  was  based  on  a 
block-by-block  estimation  of  slowly-varying  parameters.  By  applying  a  relatively 
short  sliding  window  with  hop  size  of  H  =  1,  the  quickly- varying  parameters  of 
chirp  signals  can  be  accurately  estimated.  Essentially,  higher  resolution  estimates 
are  found  for  the  fundamental  frequency  and  amplitude  contours  used  by  Buck  et 
al.  [5]  in  the  modification  and  synthesis  of  bottlenose  dolphin  whistle  calls.  Due 
to  the  sensitivity  of  the  HAS,  the  optimal  scheme  for  watermarking  marine  mam¬ 
mal  whistle  calls  is  based  on  slight  imperceptible  modifications  of  the  fundamental 
frequency  contour.  High-resolution  frequency  estimation  is  essential  for  producing 
natural  sounding  st.ego-signals  that  are  robust  to  channel-induced  signal  distortion 
and  additive  ambient  noise. 

An  interesting  result,  previously  unknown  due  to  the  lower  resolution  of  other  tech¬ 
niques,  is  that  the  bottlenose  dolphin  whistles  exhibit  an  inherent  fluctuating  vibrato 
of  the  fundamental  frequency  contour,  presumably  due  to  the  physical  mechanism 
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for  generating  whistles.  A  typical  vibrato  of  the  bottlenose  dolphin  fundamental  fre¬ 
quency,  ranging  from  6  to  22  kHz,  has  a  period  of  1  msec  with  a  magnitude  from  50  to 
100  Hz.  The  presence  and  resolvability  of  the  inherent  vibrato  naturally  lead  to  wa¬ 
termarking  the  instantaneous  mean  fundamental  frequency  contour  with  a  synthetic 
vibrato  using  CPM  signals. 

Directions  for  future  work  can  be  divided  into  two  categories:  updating  the  ex¬ 
isting  model  to  better  describe  marine  mammal  whistle  generation  and  addressing 
operational  aspects  of  a  covert  communications  system.  The  major  distinction  be¬ 
tween  these  categories  is  that  modeling  can  performed  offline  at  ideal  SNRs,  while  a 
covert  communications  system  will  optimally  operate  online  at  degraded  SNRs. 

Accurate  modeling  of  marine  mammal  whistle  calls  requires  high-quality  record¬ 
ings  with  a  high  SNR  and  sufficient  sample  rate  to  capture  the  desired  harmonics 
without  aliasing.  The  custom  built  suction  cup  hydrophone,  used  in  the  Sarasota 
Bottlenose  Dolphin  Whistle  Catalog  to  record  whistles  during  brief  capture-release 
events,  provides  recordings  with  excellent  SNR.  For  the  whistle  recording  studied  in 
this  thesis,  the  high  frequency  harmonics  are  cutoff  above  40  kHz.  Optimal  recordings 
should  use  a  high  enough  sample  rate  to  resolve  the  desired  harmonics  and  employ 
anti-aliasing  filters  to  limit  whistle  distortion.  A  large  number  of  bottlenose  dolphin 
whistle  calls  should  be  analyzed  to  determine  characteristic  modulations  of  the  fre¬ 
quency  and  amplitude  contours.  If  these  characteristics  can  be  accurately  modeled, 
natural  sounding  whistles  can  be  generated  from  scratch,  without  requiring  a  whistle 
recording  to  develop  frequency  and/or  amplitude  contours.  The  existing  sinusoidal 
model  could  be  updated  to  include  components  of  the  whistles  that  are  not  confined 
to  narrow  band  harmonics.  The  apparent  stochastic  effects  of  the  whistles,  such  as 
during  the  attack  or  final  phases  of  the  whistles,  could  be  modeled  in  a  similar  fash¬ 
ion  as  Levine’s  sinusoid+noise+transient  model  [36].  Finally,  the  bottlenose  dolphin 
vocal  tract  could  be  modeled  to  improve  the  sinusoidal  synthesis  model,  as  shown  in 
Fig.  1-1. 
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One  of  the  drawbacks  for  using  the  HTLS  algorithm  to  track  fundamental  fre¬ 
quency  contours  in  a  covert  communications  system  is  the  high  computational  load 
required  to  obtain  a  frequency  estimate  for  each  sample.  A  recursive  implementa¬ 
tion  of  the  weighted  HTLS  algorithm,  using  an  appropriate  forgetting  factor  A  to 
discard  old  data,  would  greatly  improve  the  algorithm’s  computational  load  for  real 
time  applications.  Liang  [37]  discusses  using  the  SVD-update  algorithm  of  Bunch 
and  Nielsen  [7]  after  calculating  the  initial  SVD  to  reduce  the  computational  loading 
of  sequential  chirp  parameter  estimation.  Taking  advantage  of  the  state-space  model 
utilized  in  the  HTLS  algorithm,  an  extended  Kalman  filter  [22]  could  be  developed  to 
track  parameter  changes  throughout  a  whistle  call.  It,  would  be  beneficial  to  develop 
a  more  robust  way  to  deal  with  channel-induced  ISL  such  as  using  the  Expectation- 
Maximization  (EM)  algorithm  [48]  to  estimate  channel  conditions  and  performing 
channel  equalization  prior  to  frequency  estimation.  It  could  also  turn  out  that  other 
frequency  estimators,  such  as  Quinn's  FTI  frequency  estimator,  are  a  better  choice 
than  the  HTLS  algorithm  for  watermark  detection  and  retrieval.  Quinn  [55]  com¬ 
bines  FTI  frequency  estimation  with  a  Hidden  Markov  Model  (HMM)  to  track  slowly 
varying  frequencies  at  low  SNR.  HMMs  could  be  developed  to  improve  freequency 
tracking  of  marine  mammal  whistle  calls  at  low  SNR. 

Different  watermarking  schemes  should  be  tested  and  compared  in  terms  of  their 
ability  to  produce  natural  sounding  synthetic  stego- signals,  potential  achievable  data 
rates,  and  watermark  detection  and  retrieval  performance.  While  this  thesis  focused 
on  the  frequency  of  estimation  of  a  single  marine  mammal  whistle  call,  an  operational 
environment  at  sea  will  often  include  actual  marine  mammal  whistle  calls  in  addition 
to  the  synthetic  stego-signal.  Sturtivant  and  Datta  [67]  have  looked  at  extracting 
whistle  contours  from  recordings  of  several  dolphins.  An  eventual  covert  communica¬ 
tions  system  will  most  likely  need  to  be  able  to  overcome  acoustic  interference  from 
biologies  that  respond  to  the  natural  sounding  stego-signals. 
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Appendix  A 


Prony’s  Derivation  of  the  Linear 
Prediction  Equations 


Pronv  demonstrated  that  the  nonlinear  aspects  of  Eq.  (2.3), 

4«]  =  X>~r'  •  (A.n 

k= 1 

can  be  embedded  into  a  polynomial  factorization  problem  [43].  He  showed  that  the 
poles  Zf,.  can  be  resolved  separately  from  the  parameters  hk,  which  can  then  be  found 
by  solving  Eq.  (2.6).  The  key  to  the  separation  is  to  recognize  that  Eq.  (A.l)  is  the 
solution  to  a  homogeneous  linear  constant-coefficient  difference  equation.  In  order  to 
find  the  form  of  this  difference  equation,  first  define  the  polynomial  4>{z)  that  has  the 
poles  zk  as  its  roots, 

p 

<i>{z) = -  **)  ■  ( a -2) 

k= 1 

If  the  products  of  Eq.  (A. 2)  are  expanded  into  a  power  series,  the  polynomial  may  be 
represented  as  the  summation, 

v 

<j>(z)  =  'STw[m]z?-m  ,  (A. 3) 

?n= 0 
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with  complex  coefficients  w[rn]  such  that  te[0]  =  1.  Shifting  the  index  in  Eq.  (A.l) 
from  n  to  n  —  rn  and  multiplying  by  the  parameter  w[m)  yields 

p 

w[rn]x[n  —  m]  —  w[m ]  ^  .  (A. 4) 

Forming  similar  products  ty[0]x[n], ....  iv\p]x[n  —  p]  and  summing  produces 

wHx[n  -  m\  =  5Z  Mrm_1 

m— 0  777—0  1 

=  ■  (a.5) 

k=  1  rn—0 


which  is  valid  for  p  +  1  <  n  <  2p.  Making  the  substitution  z™  rn  1  =  z£  p  1  zpk 
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w[m]x[n  —  m 
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=  0  . 


(A. 6) 


Ecp  (A. 6)  is  the  linear  difference  equation  whose  homogeneous  solution  is  given  by 
Eq.  (A.l).  Eq.  (A. 3)  is  the  characteristic  equation  associated  with  this  linear  differ¬ 
ence  equation.  The  set  of  valid  linear  prediction  equations  is  expressed  as 


x\p\  x\p  —  1]  ...  x[l] 

u;[l] 

1 . " . . 

H 

+ 

1 _ 

x[p  +  1]  x\p]  . . .  x\2] 

w[2\ 

=  — 

x\p  +  2] 

x[2 p  -  1]  x[2p  —  2]  ...  x\p] 

w[p\  _ 

- 1 

CN 

H 

(A. 7) 


Although  it  is  derived  from  different  assumptions,  the  modern  Prony’s  method,  which 
accounts  for  error  in  the  model,  is  equivalent  to  the  covariance  method  of  linear 
prediction  [41], 
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